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The  purpose  of  this  thesis  was  to  analyze  the  central 
site  portion  of  the  Personnel  Data  System,  and  propose  a 
distributed  processing  system  design  for  implementation.  A 
thorough  review  of  applicable  distributed  systems  theory  was 
to  be  the  basis  for  development  of  a  design  strategy  which 
would  employ  appropriate  concepts  to  improve  system  support 
for  all  users. 

The  design  developed  is  based  on  analysis  of  system 
functions  employed  utilizing  available  terminals  connected 
to  the  central  site,  and  proposes  distributed  clusters  of 
terminals  developed  through  an  allocation  process  based  on 
precedence  consideration  of  user  location,  system  function 
usage  similarity,  and  the  office  location  of  individual  ter¬ 
minals.  Several  additional  areas  of  future  research  were 
identified  which  will  require  expansion  of  current  system 
documentation  in  order  to  employ  some  of  the  design  tech¬ 
niques  described. 
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*  -  Abstract 

A  distributed  processing  system  design  strategy  was 
developed  and  applied  to  the  central  site  portion  of  the 
Personnel  Data  System  at  the  Air  Force  Manpower  and  Personnel 
Center  (AFMPC) ,  Randolph  Air  Force  Base,  Texas.  A  software 
package  was  developed  to  support  building  of  a  user  database 
and  the  clustering  of  users  based  on  geographic  locations  and 
similarity  of  system  function  use.  Design  considerations 
such  as  data  redundancy  and  concurrency  controls  are  discussed, 
along  with  the  concept  of  Conflict  Graph  Analysis  developed 
to  support  the  System  for  Distributed  Databases. 

Software  documentation  and  similarity  tables  depicting 
the  distributed  system  design  involving  425  terminals  clus¬ 
tered  into  46  user  groupings  are  included.  The  design  as 
presented  requires  further  development  to  include  data  access 
requirements  and  definition  of  lower  levels  of  system  func¬ 
tion  utilization.  Development  of  a  simulation  package  to 
validate  the  final  system  design  is  recommended. 
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A  DISTRIBUTED  PROCESSING  DESIGN  FOR  THE 
PERSONNEL  DATA  SYSTEM'S  CENTRAL  SITE 


I  Introduction 


Background 

The  Personnel  Data  System  (PDS)  central  site,  located 
at  the  Air  Force  Manpower  and  Personnel  Center  (AFMPC) ,  Ran¬ 
dolph  AFB,  Texas,  is  the  hub  of  the  world-wide  PDS  which  pro¬ 
vides  up-to-date  personnel  information  to  all  major  Air  Force 
activities.  The  central  site  is  the  repository  for  personnel 
records  on  every  member  of  the  total  force  Air  Force  community 
(active  duty  military  and  civilian,  Air  National  Guard,  and 
the  Air  Force  Reserve)  and  provides  on-line  file  access 
through  more  than  500  terminals  and  16  minicomputers,  to  per¬ 
sonnel  managers  at  the  AFMPC,  Pentagon,  Air  Reserve  Personnel 
Center,  Headquarters  Air  Force  Reserve,  Office  of  Civilian 
Personnel  Operations,  and  most  major  commands  and  separate 
operating  agencies. 

The  hardware  environment  comprising  the  current  central 
site  is  described  in  Appendix  A  (Ref  1:3-6).  All  of  this 
equipment  will  be  replaced  in  the  next  several  years  as  part 
of  the  AFMPC  competitive  reacquisition  program  (Ref  1) .  Much 
of  the  design  strategy  developed  in  this  investigation  is 
based  on  utilization  of  the  equipment  capabilities  described 
for  reacquisition,  or  projected  capabilities  after  equipment 
upgrade. 


Problem  Statement 


>  -  User  demands  for  new  processing  power  have  greatly  in¬ 

creased  during  the  past  twenty  years  requiring  several  system 
expansions  and  continual  searches  for  processing  optimization 
areas.  Increased  workloads  have  repeatedly  caused  system 
response  time  degradations  and  poor  turnaround  time  for  batch 
applications.  Additionally,  usage  of  many  new  applications, 
developed  in  response  to  perceived  user  needs,  have  had  to  be 
tightly  controlled  to  prevent  further  adverse  system  impact. 

Such  current  system  limitations  have  generated  consid¬ 
erable  interest  in  identifying  ways  to  improve  response  to 
all  user  needs.  One  such  area  of  interest  is  the  feasibility 
of  applying  distributed  processing  system  concepts  to  the  PDS 
central  site.  Successful  implementations  of  a  distributed 
system  can  offer  significant  benefits  to  users  in  terms  of 
improved  response  times,  availability,  and  flexibility/adapta¬ 
bility. 

The  purpose  of  this  work  then  is  to  perform  an  analysis 
of  the  current  system,  propose  a  distribution  strategy,  and 
develop  a  distributed  system  design  supporting  central  site 
operations  and  providing  the  benefits  described  above  to  all 
users. 

Scope 

Processing  currently  encompassed  in  the  PDS  central 
site  (not  ancillary  processing  at  base  level  or  on  dedicated 
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equipment  at  other  levels)  will  be  the  basis  for  analysis. 

The  central  site  environment  forming  the  basis  for  this  anal¬ 
ysis  is  portrayed  in  Figure  1-1.  No  effort  will  be  made  to 
perform  a  communication  network  analysis,  or  to  require 
structural  changes  in  the  central  site  computer  support 
structure.  Distributed  nodes  will  be  formed  using  a  first 
priority  of  location  commonality  to  avoid  increasing  communi¬ 
cation  requirements.  Software  development  will  be  limited  to 
that  required  to  support  the  distributed  design  analysis  pro¬ 
cess  . 

Assumptions  and  Constraints 

The  design  of  a  distributed  system  would  typically  in¬ 
clude  tailoring  the  attributes  of  each  node  to  the  require¬ 
ments  of  the  users  to  be  supported.  These  attributes  would 
include  numbers  of  terminals,  disk  storage,  tape  drives, 
memory,  and  card  readers/punches.  Many  of  these  attributes 
have  been  defined  to  some  extent  by  the  AFMPC  reacquisition 
description  of  the  remote  batch  terminals  which  will  serve 
as  the  basis  for  the  distributed  system  node  computers.  These 
attributes  are  included  below  as  design  constraints. 

-  Any  system  functions  not  included  in  the  distributed 
environment  are  dependent  on  central  site  capabilities  ex¬ 
clusively,  or  are  part  of  the  Microform  system  which  is  ex¬ 
cluded  from  this  analysis. 

-  The  hardware  attributes  required  for  each  distributed 


Fig  1-1.  PDS  Central  Site  System 
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system  node  must  not  exceed  the  projected  capabilities  of  the 
remote  batch  terminal  (RBT)  in  the  AFMPC  reacquisition  pro¬ 
gram.  These  capabilities  include:  100  MB  intermediate 
storage,  conmunications  support  for  one  or  two  9600  BPS  lines, 
operator  console,  9-track  1600  BPI  magnetic  tape  drive,  card 
reader,  and  10  KBPS  communications  capability  for  serving  as 
a  concentrator  for  4  to  16  CRT  devices.  The  reacquisition 
plan  calls  for  a  minimum  order  of  twenty  such  devices,  with 
possible  growth  to  53. 

-  Documentation  of  current  system  processing  is  avail¬ 
able  to  the  extent  of  correlating  terminal  identifiers,  user- 
codes,  system  functions  or  groups  of  functions,  and  personnel 
transaction  identifiers  (PTI)  input  under  the  PERSTRANS  system 
function.  Additional  detailed  information  cannot  be  obtained 
without  implementing  software  monitors.  Such  software  could 
not  be  developed  and  implemented  in  the  time  available,  but 
will  be  discussed  in  Chapter  V  for  recommended  future  actions. 

-  Documentation  of  user  processing  requirements  sup¬ 
ported  by  the  Procurement  Management  System  (PMS)  is  essen¬ 
tially  non-existent.  Additionally,  the  interactive  structure 
of  PMS,  with  on-line  data  change,  demands  considerable  de¬ 
tailed  analysis  of  system  operations  to  develop  a  system 
strategy  for  data  concurrency  management  and  the  prevention 
of  deadlocks  and  lost  updates  as  discussed  later  in  Chapter 
II.  Obtaining  the  necessary  documentation  would  require 
manpower  costs  at  AFMPC  which  cannot  be  justified  at  this 
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time,  precluding  the  inclusion  of  the  PMS  in  the  distributed 
system  design  being  developed. 

-  The  lack  of  information  linking  each  terminal  user  to 
the  data  items  accessed  through  all  system  functions  prevents 
the  expansion  of  the  similarity  table  and  use  of  conflict 
graph  analysis  processes  to  further  develop  the  detailed 
design. 

General  Approach 

Three  initial  efforts  were  begun  concurrently  as  the 
project  was  started.  First  was  a  review  of  available  liter¬ 
ature  in  the  area  of  distributed  systems  design  and  implemen¬ 
tation.  The  results  of  the  review  are  documented  in  Chapter 
II,  where  the  theoretical  development  is  described.  Secondly, 
several  sources  of  information  concerning  current  central 
site  operations  were  identified  and  recent  historical  data 
was  requested.  This  information  and  some  insights  into  sys¬ 
tem  evolution  and  future  plans,  based  on  past  experience  and 
continuing  contact  with  system  managers  at  AFMPC,  contributed 
to  the  description  developed  in  Chapter  III.  The  third  part 
of  these  initial  efforts  was  the  design  and  implementation 
of  software  which  supports  creation  of  the  database  used  to 
build  descriptions  of  system  users.  Documentation  of  this 
program  is  included  as  Appendix  B,  while  the  description  of 
program  use  is  included  in  Chapter  IV. 

Upon  completion  of  the  terminal  use  database,  similarity 
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matrices  were  created  which  portrayed  the  degree  of  common¬ 
ality  among  users  in  the  same  geographical  locations.  These 
user  groupings,  or  clusters,  were  established  through  an 
iterative  process  using  target  minimal  similarity  values, 
and  provided  the  basis  for  applying  the  constraints  of  the 
reacquisition  hardware  in  arriving  at  the  initial  design. 

The  database  employed  in  generating  these  clusters  was  vali¬ 
dated  during  a  visit  to  AET4PC  in  August  1981,  and  was  updated 
to  a  current  position  as  of  1  July  1981. 
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II  Theoretical  Basis 


Introduction 

Distributed  processing,  distributed  systems,  dispersed 
systems,  networked  systems,  and  distributed  databases  are 
only  a  few  of  the  terms  which  seek  to  define  the  concept  of 
providing  computing  resources  closer  to  the  intended  user 
community.  Discussions  of  distributed  processing  concepts 
reflect  the  continuing  confusion  that  exists  because  a  common 
definition  has  not  been  found.  One  reason  for  the  continuing 
discussion  and  a  great  deal  of  the  confusion  results  from  the 
fact  that  these  terms,  and  many  others  (Ref  11:87-104),  refer 
to  specific  applications  which  are  somewhere  along  a  continuum 
of  application  possibilities.  This  continuum  results  from  the 
diverse  needs  of  the  thousands  of  organizations  that  possess 
computer  processing  capabilities  that  evolve  as  the  organiza¬ 
tion  evolves  and  as  technological  advances  provide  capabili¬ 
ties  previously  unavailable  or  too  costly. 

Distributed  Processing  System  Structures 

Two  distributed  system  structures  are  common  to  most 
literature  on  this  subject  (Ref  4:39-43):  horizontally  and 
hierarchially  distributed  systems.  Most  authors  are  eager 
to  avoid  categories  that  are  too  narrow,  and  will  also  include 


a  "composite  system",  sometimes  referred  to  as  a  hybrid  system 
(Ref  4:44) . 


Horizontally  Distributed  Systems .  An  example  of  a 
horizontally  distributed  system,  depicted  in  Figure  2-1,  con¬ 
sists  of  three  processing  centers  which  would  probably  have 
been  distributed  primarily  on  the  basis  of  location  or  func¬ 
tion.  Such  a  structure  would  normally  be  created  by  inter¬ 
facing  three  previously  autonomous  processing  centers  to  in¬ 
crease  the  overall  capabilities  of  the  system  by  supporting 
resource  sharing,  data  exchange,  and  workload  leveling  between 
system  nodes.  This  type  of  distributed  design  can  also  pro¬ 
vide  the  capability  for  a  higher  headquarters  computer  to 
extract  data  for  consolidated  reports  listing  information 
from  all  organization  functions. 

Hierarchial lv  Distributed  System.  While  horizontal 
distribution  typically  comes  into  being  by  interaction  of 
previously  autonomous  processing  centers,  the  hierarchial 
system,  illustrated  in  Figure  2-2,  results  from  the  growth 
of  a  single  central  processing  center  system.  The  expansion 
of  this  centralized  system  might  occur  as  the  result  of 
growth  in  the  number  of  applications  required  by  system  users. 
Rather  than  adding  increased  processing  power  to  the  existing 
processing  center,  the  creation  of  subordinate  centers  may  be 
more  attractive  because  of  cost  criteria,  long  range  plans 
for  further  additions  to  the  system,  or  the  fact  that  the 
current  system  has  reached  the  limits  of  its  expansion  capa¬ 
bilities.  The  distributed  processing  center  created  in  this 
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Fig  2-1.  Horizontally  Distributed  System 
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Fig  2-2.  Hierarchial  Distributed  System 
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manner  can  be  specialized  in  that  it  is  designed  to  support 
one  or  more  specific  applications  which  are  a  small  subset  of 
applications  supported  by  the  central,  or  host  computer. 

Typically,  a  hierarchial  distribution  is  based  on  the 
desire  to  distribute  processing  at  the  system  level  where  it 
can  be  accomplished  most  effectively  as  measured  by  a  cost/ 
performance  ratio.  Some  of  the  performance  criteria  which 
may  be  important  to  this  analysis  are  discussed  later  in  this 
chapter. 

Hybrid  Distributed  Systems.  This  category  of  distri¬ 
buted  system  structures  is  a  kind  of  "catch-all"  category  for 
systems  that  do  not  neatly  fit  into  either  of  the  previously 
described  categories.  Most  of  the  larger,  more  complex  sys¬ 
tems  would  fall  into  this  category,  because  they  include 
interconnection  of  generally  autonomous  processing  centers, 
some  having  subordinate  hierarchially  distributed  systems, 
and  in  some  cases  a  headquarters  host  computer  which  extracts 
reporting  information  from  all  portions  of  the  system  for  use 
by  top  level  managers.  As  will  be  shown  in  Chapter  III,  the 
current  Personnel  Data  System  is  a  member  of  this  hybrid 
category,  and  has  been  used  as  the  model  for  creation  of  the 
example  of  the  hybrid  system  provided  in  Figure  2-3. 

Resul ts  of  Distribution 

Before  discussing  the  possible  techniques  for  developing 
a  distributed  system,  the  analyst  must  be  aware  of  the 


Fig  2-3.  A  Hybrid  Distributed  System 
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advantages  and  disadvantages  of  distribution,  and  develop  an 
approach  which  meets  the  specific  goals  for  the  organization 
being  supported.  Several  of  the  advantages  have  been  touched 
upon  in  the  preceding  description  of  typical  distributed  sys¬ 
tem  structures.  The  ability  to  design  a  system  node  which 
is  specialized  to  perform  a  certain  function  will  often 
achieve  very  attractive  performance  capabilities  when  compared 
to  a  multitalented  node  which  must  perform  a  diverse  number 
of  tasks  equally  well.  Additionally,  from  a  design  viewpoint, 
as  the  complexity  of  a  system  increases  the  problems  with 
maintenance  of  the  system  will  also  increase  (Ref  20:18-21). 

A  second  capability  distributed  systems  provide  is  data 
exchange  or  sharing,  in  that  data  not  available  at  a  certain 
system  node  can  be  accessed  over  the  system  communications 
network.  This  is  especially  important  in  systems  with  many 
very  large  databases  where  too  much  data  redundancy  becomes 
unacceptably  expensive.  A  user  needing  a  single  data  item 
from  a  file  can  request  that  data  for  temporary  use  rather 
than  being  required  to  possess  a  local  replication  of  the 
entire  database.  As  with  most  system  capabilities,  the  ad¬ 
vantages  normally  have  countervailing  disadvantages  which 
must  be  considered  in  the  design  analysis.  As  the  number  of 
requests  for  data  from  non-local  databases  increase,  the  com¬ 
munications  traffic  and  cost  will  increase  even  though  storage 
costs  would  be  lower,  because  of  fewer  applications.  One 
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optimization  technique  computes  system  costs  for  storage  and 
transmission  (Ref  7) .  The  design  can  be  structured  to  mini¬ 
mize  costs  once  unit  storage  and  transmission  costs,  file 
lengths,  and  request  rates  are  available. 

Load  leveling  is  another  benefit  to  be  derived  from  a 
distributed  system  structure.  Systems  that  evolve  over  time 
into  some  distributed  mode  often  provide  the  capability  to 
migrate  applications  or  workload  from  one  system  node  to 
another  to  relieve  overloads  or  imbalances  that  develop.  Off 
setting  this  advantage  is  the  possibility  of  increased  system 
costs  when  peripherals  or  added  resources  must  be  placed  at 
a  system  node  to  support  possible  workload  migrations. 

Another  major  advantage  is  the  ability  of  a  user  at  a 
distributed  node  to  utilize  some  system  processing  capability 
not  available  at  that  user's  local  node.  In  a  properly  im¬ 
plemented  system  this  capability  would  be  transparent  to  the 
user  who  has  no  need  to  be  concerned  with  where  the  actual 
process  is  performed.  This  resource  sharing  capability  is 
faced  with  the  same  balancing  of  costs  versus  benefits  de¬ 
scribed  for  data  sharing,  since  the  request  and  possibly  data 
must  be  transferred  between  nodes  to  complete  the  action. 

Other  issues  of  concern  in  the  design  that  may  be  im¬ 
portant  in  meeting  organizational  goals  are  the  ability  to 
more  easily  restrict  access  to  data  in  a  distributed  system 
simply  because  interconnection  capabilities  may  be  more 
easily  controlled.  Also,  the  possibility  of  improving 
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response  time  for  critical  applications  can  be  achieved  on 
the  basis  of  nodal  specialization,  while  system  availability 
is  enhanced  by  allowing  users  to  be  switched  to  other  nodes 
or  the  central  system  when  the  local  node  fails. 

Distribution  of  Users.  A  critical  problem  in  developing 
a  distributed  system  design  is  the  determination  of  who  is 
supported  by  which  system  node.  Strict  distribution  patterns 
based  on  functional  affiliations  is  a  common  approach  which 
corresponds  to  the  previously  described  horizontal  distribu¬ 
tion  structure.  The  major  adverse  aspect  of  such  a  design 
would  appear  to  be  the  system  vulnerability  to  hardware  fail¬ 
ures  because  applications  would  most  likely  be  available  at 
only  a  single  system  node.  This  problem  can  be  alleviated  if 
the  system  also  incorporates  the  hierarchial  property  where 
a  node  will  possess  a  subset  of  the  capabilities  available  at 
the  host  node.  This  enables  support  at  the  host  or  other 
nodes  at  the  same  horizontal  system  level,  if  the  local  node 
becomes  unavailable. 

Distributions  based  on  database  requirements  can  also 
encounter  the  insufficient  redundancy  problem  unless  some 
amount  of  data  replication  is  achieved  across  the  system. 

The  amount  of  data  redundancy  required  to  achieve  a  desired 
level  of  availability  is  easily  derived  based  on  the  proba¬ 
bility  of  the  local  copy  being  inaccessible,  multiplied  by 
the  probability  of  all  other  possible  sources  being  inaccess- 
able.  In  the  fully  connected  distributed  network,  where 
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every  node  is  connected  to  all  other  nodes  in  the  system 
(Ref  4:65-66),  the  probability  of  being  unable  to  access  data 
is  essentially  eliminated  since  alternate  access  paths  are 
established  automatically  when  the  primary  data  source  or 
path  is  unavailable.  Such  availability  enhancement,  at  the 
cost  of  communications  lines  between  all  nodes,  is  most  bene¬ 
ficial  in  system  where  real-time  data  changes  and  access  are 
essential  to  the  achievement  of  organizational  goals. 

System  Analysis  and  Design 

Perhaps  the  best  definition  of  the  activities  encompassed 
by  the  terms  systems  analysis  and  design  is  the  identification 
and  gathering  of  information  about  a  problem,  and  the  use  of 
that  information  to  formulate  and  evaluate  potential  solutions 
to  those  problems  (Ref  19:18-19) .  Palmer  (Ref  13)  describes 
a  structured  top-down  approach  to  designing  distributed  com¬ 
puting  systems  by  developing  "baseline"  designs  at  four  levels 
of  system  analysis.  The  four  levels:  subsystem/network,  nodal, 
computer  system,  and  hardware/software  levels,  allow  early 
analysis  to  focus  on  assimilation  and  refinement  of  require¬ 
ments  which  support  design  development  for  that  system  level, 
but  more  importantly  provides  the  basis  for  carrying  the  design 
to  the  next  level  of  analysis.  Each  baseline  design  is  devel¬ 
oped  using  a  four-step  sequence  of  activities:  analysis,  par¬ 
titioning,  allocation,  and  synthesis.  Each  of  these  activities 
is  described  in  some  detail  below  for  the  subsystem/network 


level . 


Analysis .  This  activity  involves  the  identification 
and  accumulation  of  user  requirement  information  which  details 
the  functional  and  data  entities,  and  interrelationships  which 
exist  in  the  system.  The  output  from  this  activity  would  nor¬ 
mally  include  a  set  of  interrelated  functions  and  data  enti¬ 
ties,  and  information  describing  the  functional  performance 
requirements  (data  loading,  reliability,  etc) . 

Partitioning  and  Allocation.  Using  the  analysis  infor¬ 
mation,  the  partitioning  and  allocation  activities  involve 
applying  selected  distribution  criteria  to  develop  candidate 
partitions,  or  groups/clusters  of  users.  These  clusters  based 
on  the  distribution  criteria  would  reflect  the  goals  of  the 
following  allocation  activity  and  "good"  design  practices. 

The  partitioning  criteria  typically  involve  data  sharing, 
processing,  and  precedence  relationships  which  exist  between 
system  entities.  Although  statistical  analysis  and  simula¬ 
tions  may  be  employed  to  determine  the  relative  values  of 
candidate  partitions,  the  numbers  and  types  of  nodes  are  more 
often  determined  to  a  significant  extent  by  physical  con¬ 
straints  such  as  the  number  and  location  of  users  to  be  sup¬ 
ported  (Ref  13:24-25). 

Synthesis .  The  goal  of  the  synthesis  activity  is  to 
identify  the  interface  and  control  requirements  to  maintain 
control  and  data  linkages  that  may  have  become  broken  or  more 
complex  as  a  result  of  allocation  activities.  Concerns  such 
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as  data  management  to  maintain  multiple  copies  of  data  and 
maintenance  of  communications  with  respect  to  shared  data  are 
important  activities  that  must  be  accomplished  and  verified. 

Cluster  Development  Techniques 

The  development  of  clusters  of  users  which  will  be  sup¬ 
ported  by  a  single  node  in  the  distributed  system  can  be 
accomplished  in  several  ways.  The  method  selected  often  de¬ 
pends  on  the  constraints  on  system  design  resulting  from  cost 
and  performance  goals.  There  are  also  constraints  based  on 
equipment  capabilities  and  even  the  organization's  political 
environment  which  often  must  be  taken  into  account  in  the  de¬ 
velopment  process. 

Based  on  perfect  knowledge  of  system  functions,  users, 
and  cost/benefit  tradeoffs  in  the  communications  area,  it  is 
theoretically  possible  to  develop  simulation  programs  which 
will  forecast  performance  characteristics  under  the  varying 
clustering  concepts  devised.  In  small  systems  it  could  be 
possible  to  apply  graph  theory  to  identify  a  minimum  spanning 
tree  of  user  nodes  which  minimize  costs  and  maximize  benefits 
to  obtain  an  optimum  clustering  strategy.  As  system  size 
increases,  such  capabilities  become  np-complete,  requiring 
the  use  of  some  heuristic  technique  (Ref  13:25). 

One  technique  which  provides  a  basis  for  determining 
system  distribution  patterns  is  called  "fuzzy  clustering", 
and  is  based  on  creation  of  proximity  matrices  which  show 
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the  degree  of  commonality  that  exists  between  system  entities. 
Developed  by  Buckles  and  Hardin  (Ref  6) ,  this  technique  can 
be  employed  at  several  levels  of  detail  to  present  a  fuzzy 
picture  of  existing  relationships  in  the  system  being  analyzed. 
A  subset  of  the  proximity  matrix  approach  is  required  in  some 
cases  where  the  full  complement  of  descriptive  information  is 
not  available.  This  less  expansive  approach  uses  similarity 
matrices  which  present  the  percentage  of  system  work  that 
system  entities  have  in  common.  The  development  of  the  simi¬ 
larity  matrix  is  accomplished  by  dividing  the  number  of  com¬ 
mon  system  actions  that  two  users  employ  by  the  total  number 
of  system  actions  they  used.  This  measure  shows  the  amount 
of  cohesion  that,  exists  between  the  users,  and  when  applied 
across  a  system  will  present  a  measure  of  cohesion  existing 
among  users  supported  at  a  single  distributed  system  node. 

Figure  2-5  is  the  similarity  matrix  for  five  terminals 
located  at  the  AFMPC  (G62,  G37,  G20,  GlOO,  and  B2  are  the 
terminal  identifiers) .  Based  on  comparison  of  the  different 
system  functions  employed  during  the  months  of  February  and 
July  1981  (Fig  2-4) ,  users  of  terminal  G62  used  six  different 
system  functions.  During  the  same  period,  terminals  G37  and 
GlOO  were  used  to  access  five  system  functions,  all  of  which 
were  used  by  G62  users.  This  usage  information  results  in 
the  computation  of  a  similarity  value  of  83  percent  (six 
different  system  functions  employed,  with  five  being  commonly 
used  at  the  terminals  being  analyzed)  reflected  in  the  sample 
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terminal  Id 
B2 
G20 
G37 
G62 
G100 


System  Functions 

PERST,  CANDE,  RUN,  SURF,  GURU,  ATLAS,  AIRS 
PERST,  CANDE,  RUN,  SURF,  GURU,  ATLAS,  AIRS 
CANDE,  RUN,  SURF,  GURU,  AIRS 
PERST,  CANDE,  RUN,  SURF,  GURU,  AIRS 
CANDE,  RUN,  SURF,  GURU,  AIRS 


Fig  2-4.  Function  Usage  Example 


Terminal  Id 

B2 

G20 

G37 

G62 

Hj 

B2 

— 

1.0 

.71 

.86 

.71 

G20 

1.0 

i 

i 

i 

.71 

.86 

.71 

G37 

.71 

.71 

i 

i 

l 

ro 

00 

• 

1.0 

G62 

.86 

.86 

.83 

1 

1 

1 

.83 

G100 

.71 

.71 

1.0 

rn 

oo 

• 

i 

i 

i 

Fig  2-5.  Example  Similarity  Matrix 


similarity  matrix.  When  comparing  G62  usage  with  the  data 
gathered  for  terminals  G20  and  B2,  the  resulting  similarity 
value  is  86  percent  (seven  function  employed  totally,  with 
six  being  common) .  Carrying  this  process  across  the  relation¬ 
ships  of  each  of  the  user  pairs  results  in  the  completed 
matrix  developed  in  Figure  2-5.  From  a  design  viewpoint,  if 
these  five  users  were  allocated  to  a  single  distributed  node, 
the  result  would  be  a  node  requiring  a  total  of  seven  system 
functions.  With  added  definition  of  the  files  utilized  and 
eventually  the  data  items  and  transactions  employed,  each 
node  design  can  be  based  on  providing  a  minimal  set  of  attri¬ 
butes  which  will  achieve  maximum  support  of  the  allocated 
user's  requirements. 

Data  Distribution  and  Concurrency  Control 

Most  of  the  previous  discussion  of  distributed  system 
structures  has  been  based  on  differences  in  the  evolution  of 
their  development  and  decisions  concerning  which  system  func¬ 
tions  should  be  distributed  to  best  support  user  requirements. 

These  concerns  were  most  closely  related  to  the  distribution 
of  system  intelligence  and  the  need  to  migrate  or  develop 
software  to  support  the  distributed  processing.  Closely  tied 
to  this  migration  of  intelligence  is  the  need  to  develop  a 
strategy  for  the  distribution  of  data  to  support  the  distri¬ 
buted  functions.  Just  as  distributed  system  structures  occur 
in  several  ways,  database  distribution  is  multifaceted  and 
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involves  design  considerations  which  are  vital  to  achievement 
of  the  system  performance  goals.  Although  complete  develop¬ 
ment  of  a  database  distribution  strategy  was  not  possible  in 
this  thesis  because  data  item  usage  could  not  be  correlated 
to  individual  users,  one  data  distribution  strategy  is  des¬ 
cribed  below  and  included  in  the  recommendations  portion  of 
Chapter  V. 

Many  of  the  advantages  claimed  for  distributed  proces¬ 
sing  systems,  such  as  insulation  from  central  site  failures 
and  decreased  communications  costs  (Ref  4:29-34)  are  fully 
dependent  upon  continual  access  to  system  data.  To  insure 
this  data  access,  the  first  step  would  appear  to  be  the  loca¬ 
tion  of  database  copies  at  each  of  the  distributed  system 
nodes.  If  this  approach  were  coupled  with  the  fully  connected 
network  concept  described  earlier,  data  access  problems  can 
be  virtually  eliminated.  Of  course,  such  a  "perfect  system" 
approach  cannot  be  economically  justified  in  most  cases. 

Thus,  the  better  approach  is  to  design  the  system  within 
reasonable  cost  parameters  that  will  approach  the  achievement 
of  this  perfection  at  a  much  lower  cost. 

Data  distribution  strategies  cover  a  broad  range  of 
possibilities,  from  a  fully  redundant  copy  at  every  system 
node,  to  a  single  copy  for  the  entire  system  (whether  in  one 
location  or  partitioned  to  various  nodes) .  In  centralized 
systems  where  many  users  may  be  accessing  the  same  piece  of 
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data  at  the  same  time,  the  primary  concerns  are  the  preven¬ 
tion  of  deadlock  situations  and  "lost  updates"  (Ref  5:405). 
These  problems  are  also  of  concern  in  distributed  systems  in 
addition  to  new  problems  of  data  concurrency  among  replicated 
copies  of  the  same  data  items,  and  system  operations  when  one 
or  more  system  nodes  fail. 

Two  common  approaches  to  the  prevention  of  deadlock  and 
the  lost  update  problems  are  enforcement  of  locking  protocols 
or  allowing  deadlocks  to  occur  and  then  "backing  out"  one  of 
the  involved  transactions  so  the  other  may  complete.  Numerous 
locking  approaches  are  documented  in  the  literature  along  with 
proofs  of  their  validity  (Ref  18:324-356;  16).  In  complex 
system  where  multiple  copies  of  data  items  may  be  maintained, 
use  of  these  locking  approaches  results  in  serious  service 
delays  because  system-wide  locks  are  employed  and  must  be  syn¬ 
chronized.  This  synchronization  can  be  accomplished  by  the 
utilization  of  system  clocks  (Ref  11:225)  which  enable  the 
generation  of  individually  unique  "timestamps"  for  each  sys¬ 
tem  action,  and  the  use  of  some  PAR  (positive  acknowledgement 
and  response)  protocol. 

Although  such  approaches  will  achieve  data  consistency 
eventually,  applying  data  locking  procedures  across  the  sys¬ 
tem  is  cost  prohibitive  in  distributed  systems  where  multiple 
copies  of  data  items  are  maintained  (Ref  15:354-355).  Recent 
work,  especially  for  the  System  for  Distributed  Databases 
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(SDD-1)  (Ref  16),  presents  the  concept  of  conflict  graph 
analysis  that  can  be  utilized  during  system  design  as  a  method 
for  reducing  the  amount  of  concurrency  control  overhead. 

In  most  current  systems,  data  locking  is  employed  to 
prevent  the  problems  of  lost  update  and  deadly  embrace  dis¬ 
cussed  earlier.  Once  a  system  action  has  begun,  the  data  to 
be  used  is  locked,  preventing  access  by  any  other  user  until 
the  first  action  is  completed.  In  a  centralized  system  where 
internal  communications  exist  at  machine  speed,  such  proce¬ 
dures  are  adequate.  However  a  distributed  system  with  data 
replications  in  several  dispersed  locations  would  soon  cease 
to  operate  if  all  copies  of  the  data  had  to  be  locked  before 
an  action  could  be  started,  and  unlocked  before  the  next 
could  begin.  The  basic  concept  of  conflict  graph  analysis 
is  based  on  the  fact  that  only  a  small  proportion  of  the 
transactions  will  actually  conflict  with  each  other  in  a  dan¬ 
gerous  manner.  By  analyzing  system  transactions  and  assign¬ 
ing  them  to  transaction  classes  which  are  then  graphed  to 
portray  existing  conflicts,  the  designer  is  able  to  develop 
a  series  of  concurrency  control  actions,  most  of  which  will 
be  considerably  less  complex  and  time  consuming  than  system- 
wide  locking  (Ref  12:289-290). 

The  basis  for  data  locking  procedures  as  well  as  pro¬ 
cedures  employed  in  SDD-1  is  the  concept  of  serializability . 
The  term  serializability  comes  from  the  need  to  demonstrate 
that  the  result  of  two  schedules  for  the  accomplishment  of 


actions  in  two  transactions  are  identical  when  one  schedule 
involves  interleaving  of  actions  while  the  other  schedule 
achieve  the  same  result  from  serial  completion  of  the  opera¬ 
tions.  Figure  2-6  uses  the  transactions  T1  and  T2  to  show 
that  depending  on  the  sequence  in  which  two  transactions  are 
completed,  the  resulting  impact  on  the  database  can  be  changed 
entirely.  Without  some  background  information  indicating  the 
intent  of  the  users,  it  is  impossible  for  any  logic  within 
the  system  to  distinguish  between  the  two  "valid"  results, 
123-4567  and  124-5678.  Thus,  for  this  class  of  transactions, 
either  result  must  be  accepted  by  system  designers  as  a  valid 
result,  and  any  schedule  which  obtains  one  of  these  results 
must  be  considered  valid  also.  In  SDD-1,  concurrency  control 
is  based  on  the  ability  to  obtain  serializability  in  many  cases 
without  requiring  data  locking  on  all  copies  of  a  data  item 
across  the  entire  distributed  system.  In  the  case  of  trans¬ 
actions  T1  and  T2,  as  long  as  the  transaction  writes  are 
completed  in  the  same  sequence  in  every  case,  the  result  will 
be  consistent  across  the  system.  This  consistent  accomplish¬ 
ment  can  be  assured  by  the  implementation  of  a  protocol  re¬ 
quiring  all  system  nodes  to  process  such  transactions  in 
time-stamp  order.  This  protocol  will  not  satisfy  all  trans¬ 
actions;  however,  many  transactions  will  satisfy  the  seriali¬ 
zability  concept  with  this  protocol,  or  another  that  is  less 
restrictive  than  one  requiring  transmission  and  acknowledge¬ 
ment  of  system-wide  data  locks/unlocks. 
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TRANSACTION  Tl: 

Replace  PHONE-NR  <1234567>  Where  NAME  =  'ADAMS'. 
TRANSACTION  T2: 

Replace  PHONE-NR  <1245678>  Where  NAME  =  'ADAMS'. 


SCHEDULE  1 


SCHEDULE  2 


Tl 

T2 


T2 

Tl 


Result:  1245678 


1234567 


Fig  2-6.  Serial  Transaction  Schedules 


Documentation  of  SDD-1  protocols  which  ensure  seriali- 
zability  (Ref  3)  include  complete  coverage  of  the  concurrency 
control  problem  and  prove  the  applicability  of  the  conflict 
graph  analysis  approach  to  the  other  instances  of  dangerous 
transaction  conflict.  As  will  be  discussed  in  Chapter  V, 
documentation  of  PDS  transactions,  and  analysis  of  distributed 
system  nodes  which  are  sources  for  input  of  these  transactions 
would  allow  application  of  conflict  graph  analysis,  and  the 
capability  for  expansion  of  on-line  update  capabilities 
throughout  the  system  while  avoiding  the  overhead  currently 
envisioned. 

Summary 

The  theoretical  foundation  for  the  distributed  proces¬ 


sing  system  designed  was  developed  from  an  analysis  of  the 


literature  on  the  subject  and  background  knowledge  of  the 
central  site  environment.  The  current  system  is  an  example 
of  a  hybrid  distributed  system  with  elements  of  both  the 
hierarchial  and  horizontal  approaches.  The  general  evolution¬ 
ary  pattern  of  the  system  reinforces  this  structure,  and  the 
central  site  distribution  developed  in  this  work  will  con¬ 
tinue  that  evolution. 

Two  major  benefits  of  distributed  processing  which  have 
been  discussed  in  this  chapter  appear  extremely  applicable  to 
the  solution  of  current  PDS  central  site  problems.  As  with 
most  large  systems,  the  continual  requirement  to  develop  and 
implement  new  and  more  powerful  applications  has  tasked  the 
growth  potential  of  the  currently  available  hardware.  Expan¬ 
sion  of  current  system  capabilities,  before  the  installation 
of  the  reacquisition  hardware,  will  result  in  service  degra¬ 
dations  for  many  current  users.  Migration  of  functions  to 
some  distributed  system  nodes  could  immediately  release  cen¬ 
tral  site  resources  for  the  required  new  capabilities,  or  to 
improve  current  system  performance.  Additionally,  the  adverse 
impact  of  hardware  failures  which  occur  more  frequently  with 
the  older  hardware  which  is  currently  used,  can  be  avoided 
as  a  result  of  the  "insulation"  which  is  possible  under  dis¬ 
tributed  system  concepts. 
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Introduction 


The  Personnel  Data  System,  as  it  exists  today,  is  the 
result  of  a  twenty  year  evolution  caused  by  changing  Air 
Force  needs  in  the  "people  resource  management"  environment. 
As  laws,  policies,  and  social  pressures  have  required  Air 
Force  managers  to  function  differently,  the  PDS  has  been  mo¬ 
dified  and  expanded  to  better  support  these  new  functional 
requirements . 

While  the  base  level  portion  of  the  PDS  continues  to 
support  the  daily  operations  of  base  level  manpower  and  per¬ 
sonnel  offices,  the  central  site  system  has  grown  by  incor¬ 
porating  increased  amounts  of  data,  new  processing  functions, 
and  assuming  workloads  from  other  associated  systems.  Since 
the  world-wide  implementation  of  the  Advanced  Personnel  Data 
System  (APDS)  in  1974,  the  central  site  has  been  the  center 
of  efforts  to  provide  a  fully  integrated  support  environment 
for  total  force  management  in  the  Air  Force.  The  current 
system  structure  was  shown  in  Figure  1-1. 

PDS  Information 

The  core  of  the  PDS  central  site  system  consists  of  the 
master  data  files  which  contain  digital  records  for  every 
individual  in  the  total  force  Air  Force,  retired  personnel, 
and  individuals  actively  interested  in  being  recruited  into 
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the  Air  Force.  These  files,  consisting  of  more  than  one  mil¬ 
lion  records,  are  maintained  in  an  up-to-date  condition  pri¬ 
marily  through  batch  update  actions  based  on  data  flow  from 
the  base  and  Headquarters  Air  Force  levels  of  the  PDS. 

These  digital  records  are  all  similar  in  many  respects, 
since  they  depict  historical  data  about  personnel  actions 
relating  to  an  individual.  But  there  are  unique  aspects  de¬ 
pending  on  the  Air  Force  component  with  which  the  individual 
is  associated.  Typical  information  that  is  retained  describes 
duty  history,  promotion  history,  training,  information  con¬ 
cerning  current  assignment  and  place  of  residence,  and  many 
other  data  items  used  by  Manpower  and  Personnel  managers 
when  performing  some  action  or  making  a  decision  affecting 
the  individual. 

Typically,  a  data  item  relates  to  one  specific  func¬ 
tional  area  such  as  promotions,  assignments,  or  training, 
and  an  office  within  this  functional  area  assumes  management 
responsibility  for  that  data  item.  These  offices  of  primary 
responsibility  (OPR's)  in  many  cases  are  the  only  source  of 
data  changes  to  the  items,  or  they  are  responsible  for  in¬ 
suring  changes  made  to  the  items  by  other  system  users  are 
appropriate.  Because  of  the  size  of  the  system  the  valida¬ 
tions  necessary  to  insure  this  data  management  are  accom¬ 
plished  within  the  system.  The  necessary  controls  are  docu¬ 
mented  by  the  OPR  and  implemented  by  the  personnel  of  the 
Directorate  of  Personnel  Data  Systems  (AFMPC/MPCD) . 


System  Users 

Both  the  variety  and  number  of  system  users  supported 
from  the  central  site  have  increased  rapidly  during  the  last 
seven  years.  More  than  five  hundred  system  terminals  and  16 
minicomputers  currently  provide  access  to  system  data  for 
users  world-wide.  These  users  are  at  every  level  of  Air 
Force  Personnel  and  Manpower  management,  and  are  employed  in 
every  facet  of  these  functional  areas.  The  world-wide  dis¬ 
persion  of  users  has  resulted  in  the  development  of  a  large 
communications  network  to  provide  direct  system  access  to 
the  PDS  information  for  these  users.  To  present  a  clearer 
picture  of  the  PDS  user  community,  five  major  segments  are 
described  below. 

San  Antonio,  Texas .  More  than  200  PDS  terminals  are 
located  in  this  area,  with  the  large  majority  being  employed 
at  the  AFMPC,  the  Office  of  Civilian  Personnel  Operations 
(OCPO) ,  and  Headquarters,  Air  Training  Command  (ATC)  at 
Randolph  AFB,  Texas.  Other  users  groups  are  at  Lackland, 
Kelly,  and  Brooks  Air  Force  Bases. 

Washington  D.C.  Air  staff  functions  at  the  Pentagon, 
major  commands  at  Langley  and  Andrews  Air  Force  Bases,  sepa¬ 
rate  operating  agencies  (SOA's)  at  several  locations,  along 
with  other  Air  Force  and  non-Air  Force  organizations  are  in¬ 
cluded  in  this  widely  diverse  group  of  users  in  the  Washing¬ 


ton  area. 


Major  Commands .  Each  of  the  Air  Force  Maj corns  are  sup¬ 
ported  to  some  extent  by  the  PDS.  User  groups  in  this  cate¬ 
gory  that  have  not  been  mentioned  previously  are  Pacific  Air 
Forces  (PACAF) ,  United  States  Air  Forces  in  Europe  (USAFE), 
Alaskan  Air  Command  (AAC) ,  Strategic  Air  Command  (SAC),  Mili¬ 
tary  Airlift  Command  (MAC) ,  Air  Force  Communications  Command 
( AFCC ) ,  and  Air  Force  Logistics  Command  (AFLC) .  Each  of 
these  organizations  utilize  PDS  information  in  the  management 
of  their  personnel  resources.  The  number  of  terminals  at 
each  location  varies  according  to  staff  size  and  the  volume 
of  PDS  workload  represented. 

Separate  Operating  Agencies  and  Direct  Reporting  Units . 
Users  in  these  categories  are  widely  dispersed  across  the  con¬ 
tinental  United  States.  In  most  cases  one  or  two  system  ter¬ 
minals  have  been  provided  to  support  the  personnel  management 
activities  of  the  organization's  staff.  Included  in  this 
category  are  the  Air  Reserve  Personnel  Center  (ARPC)  and 
Headquarters,  Air  Force  Reserves  (AFRES)  both  of  which  have 
several  system  terminals  because  of  the  size  of  their  staff 
and  scope  of  responsibilities  supported. 

Technical  Training  Centers  (TTCs) .  The  TTCs  at  Chanute, 
Keesler,  Lowry,  and  Sheppard  Air  Force  Bases  have  recently 
been  included  in  the  PDS  system  access  to  support  the  train¬ 
ing  management  requirements  these  users  have  that  relate  to 
system  capabilities. 
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System  Functions 

As  mentioned  previously,  the  bulk  of  data  maintenance 
accomplished  in  the  system  occurs  as  a  result  of  batch  up¬ 
date  processes.  Day-to-day  interaction  with  the  system  for 
most  users  involves  the  need  to  determine  the  status  of  a 
person’s  record  in  one  specific  functional  aspect,  or  having 
a  change  of  status  action  which  must  be  projected  to  occur 
at  some  future  date.  Many  personnel  actions  recorded  in  the 
system  are  of  the  type  where  the  action  is  projected  to  occur 
at  some  future  date  and  the  batch  update  of  the  system  on  the 
projected  date  will  cause  an  automatic  generation  of  appro¬ 
priate  transactions  notifying  interested  personnel  that  the 
action  has  occurred.  Printed  notices  are  the  typical  methcd 
employed  to  document  completed  actions,  with  the  appropriate 
office  maintaining  a  copy  of  the  notice  for  some  period. 

Some  functional  offices  supported  by  the  system  manage  very 
large  personnel  programs  such  as  airmen  promotions  and  assign¬ 
ments,  which  preclude  manual  entry  of  all  data  changes.  In 
such  cases,  one  batch  processing  action  will  create  output 
files  containing  the  appropriate  update  transactions,  and 
these  files  will  be  used  as  input  files  to  the  next  update 
of  the  proper  master  file. 

To  support  these  kinds  of  user  activities,  the  PDS 
provides  a  large  number  of  system  functions.  There  are  pro¬ 
bably  more  than  sixty  individual  functions  available  on  the 
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system;  however,  the  ones  described  below  are  the  most  heavily 
used  and  of  interest  for  the  system  design  being  developed. 

SURF.  The  SURF  function  is  a  single  record  query  capa¬ 
bility.  SURF  is  heavily  used  by  almost  all  system  users  to 
access  preformatted  displays  of  portions  of  an  individual’s 
computerized  personnel  record.  Access  to  the  SURF  function 
requires  input  of  an  approved  usercode/password  combination 
and  selection  of  the  SURF  function  for  use  on  one  of  the  mas¬ 
ter  files.  Once  access  to  SURF  and  the  file  are  established, 
the  user  must  input  the  Social  Security  Account  Number  (SSAN) 
for  the  individual's  record  and  the  format  number  or  indivi¬ 
dual  data  item  number  to  be  displayed.  SURF  displays  typi¬ 
cally  provide  information  related  to  one  personnel  functional 
area  of  interest,  with  many  master  files  having  twenty  or 
more  standard  display  formats. 

PERSTRANS.  The  PERSTRANS  function,  abbreviated  "PERST” 
on  reports,  provides  the  capability  for  a  user  to  enter  a 
transaction  into  the  system  to  update  a  single  personnel 
record.  These  transactions  may  cause  data  changes,  create 
a  printed  report,  or  project  a  future  data  change  for  the 
specified  record.  Access  to  this  function  is  very  similar  to 
the  requirements  for  the  SURF  function,  with  the  transaction 
identification  code  being  input  to  specify  the  appropriate 
file  and  transaction  format.  Although  not  as  widely  used  as 
the  SURF  function,  more  than  fifty  percent  of  the  users 
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studied  in  this  thesis  utilize  the  PERSTRANS  function. 

ATLAS .  The  ATLAS  function  supports  the  creation  of 
batch  queries  against  PDS  files.  Any  file  described  in  a 
file  descriptor  table  can  be  accessed  through  ATLAS  inquiries 
These  tables  describe  all  of  the  attributes  of  the  records  in 
the  file  to  be  accessed.  ATLAS  queries  in  most  cases  will 
produce  either  a  printed  listing  reflecting  information  re¬ 
quested  by  the  user,  or  an  output  data  file  which  can  be  used 
for  batch  update,  generation  of  reports,  further  queries,  etc 
The  majority  of  system  users  studied  in  this  thesis  have 
access  to  the  ATLAS  function. 

Airmen  Information  Retrieval  System  (AIRS) .  The  AIRS 
function  is  utilized  by  a  limited  group  of  system  users  to 
access  data  files  which  reflect  airmen  resource  utilization 
throughout  the  Air  Force. 

GURU.  The  GURU  function  is  a  generalized  file  manipu¬ 
lation  utility  which  allows  users  to  create  data  and  text 
files,  review  and  alter  print  files  before  printing,  and 
other  data  editing  types  of  activities.  Access  to  this  func¬ 
tion  is  widespread  amongst  users,  serving  as  a  replacement 
for  the  CANDE  function  described  below. 

CANDE.  CANDE  is  a  Burroughs  Corporation  system  utility 
that  performs  many  of  the  same  functions  as  GURU.  Since  im¬ 
plementation  of  the  GURU  function  about  three  years  ago,  use 
of  CANDE  has  decreased  considerably. 
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INFO.  The  INFO  function  is  another  older  function  that 
has  been  largely  replaced  by  the  GURU  function.  INFO'S  pri¬ 
mary  purpose  is  the  creation  of  informational  text  files  by 
various  users,  that  could  then  be  accessed  by  users  across 
the  system.  Such  files  often  contained  schedules  or  system 
status  notices  which  were  maintained  by  system  management 
personnel . 

Personnel  Data  System  Training  (PDST) .  The  PDST  func¬ 
tion  is  available  to  a  limited  number  of  system  users  that 
are  involved  in  the  management  of  Air  Force  training  programs. 
This  function  includes  the  capability  for  real-time  allocation 
of  training  quotas,  and  access  to  training  programs  status 
information. 

PROMIS.  The  PROMIS  system  is  also  referred  to  as  the 
Pipeline  Management  System  (PMS)  which  supports  the  management 
of  various  aspects  of  the  training  pipeline.  PROMIS  users  for 
the  most  part  are  the  Armed  Forces  Entrance  and  Examining 
Stations  (AFEES)  throughout  the  United  States  where  Air  Force 
recruiters  attempt  to  obtain  entry  authorizations  for  pro¬ 
spective  Air  Force  recruits.  A  major  part  of  the  PROMIS  sup¬ 
port  is  the  real-time  person-job-match  process  which  is  used 
to  identify  Air  Force  requirements  for  which  a  specific  appli¬ 
cant  may  qualify.  The  system  runs  under  the  control  of  the 
Burroughs  database  management  system  software.  The  size  of 
the  software  supporting  the  PROMIS  function  necessitated 
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exclusion  of  PROMIS  users  from  the  distributed  system  design. 

Other  Functions.  As  indicated  earlier,  many  other  sys¬ 
tem  functions  are  available  to  support  PDS  user  requirements. 
Separate  functions  which  support  software  testing,  computa¬ 
tion  of  retirement  pay  estimates,  analysis  of  officer  manning, 
and  modelling  of  force  structure  changes  are  utilized  by  small 
groups  of  users  according  to  their  individual  functional  re¬ 
quirements.  Unfortunately,  system  workload  in  these  functions 
is  not  measured  separately  in  any  system  reports  and  could 
not  be  reflected  in  the  database  built  in  this  thesis. 

Documentation  of  Function  Usage 

As  discussed  in  Chapter  II,  the  basis  for  user  cluster¬ 
ing  was  the  similarity  matrix  which  depicts  the  percentage  of 
system  function  and  files  usage  users  have  in  common.  The 
system  documentation  which  provided  the  picture  of  user  work¬ 
load  patterns  was  drawn  from  various  system  generated  reports. 
Because  the  central  site  is  comprised  of  multiple  subsystems, 
no  consolidated  reports  documenting  system  workload  is  pro¬ 
duced.  Therefore,  the  amount  of  information  that  was  avail¬ 
able  to  reflect  user  employment  of  system  functions  varied 
greatly.  The  various  sources  are  described  below,  with  a 
brief  discussion  included  to  indicate  the  limitations  of 
some  of  the  sources. 

TERMS  TRIP  Reports.  The  "TERMSTRIP"  reports  provide  a 
detailed  picture  of  the  transaction  processing  requirements 
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of  each  user  and  terminal  using  the  PERSTRANS  function. 

These  reports  include  a  list  of  all  terminals  which  were  the 
source  of  transactions  for  the  period  of  the  report,  and  the 
number  of  transactions  input.  In  subsequent  sections  the 
terminal  identifier,  user  identifier,  time  of  transaction 
input,  and  the  transaction  PTI  are  recorded.  The  reports 
are  generated  twice  daily  as  the  input  transactions  are 
"stripped"  to  input  files  for  later  entry  in  batch  updates. 
The  average  daily  volume  of  transactions  input  under  the 
PERST  function  exceeds  five  thousand,  therefore  a  limited 
sample  of  these  reports  (twenty  days)  were  used  to  identify 
the  master  files  each  terminal  was  used  to  access.  As  will 
be  discussed  in  Chapter  V,  automation  of  the  analysis  of 
these  reports  would  provide  a  valuable  source  of  data  usage 
information  upon  which  to  base  expansion  of  the  similarity/ 
proximity  matrix  analysis  in  future  efforts. 

TERMUSE  Reports .  These  monthly  reports  describe  the 
use  of  system  functions  by  every  terminal  in  the  system. 

The  information  provided  includes  the  terminal  identifier, 
function  usage  in  hours,  number  of  transactions,  and  number 
of  transactions  per  hour  for  13  specific  system  functions 
and  a  grouping  under  the  title  "RUN  PROG."  Section  two  of 
this  report  lists  the  usercodes  employed  at  each  terminal 
during  the  period  of  the  report.  The  information  from  this 
report  was  critical  to  the  clustering  process,  in  that  it 
provided  the  basis  for  establishing  a  terminal  usage  pattern 
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which  strongly  suggested  optimum  assignment  of  terminals  to 
distributed  nodes  based  on  the  realities  of  actual  system 
workload. 

Other  Sources .  Additional  system  documentation  came 
from  reports  listing  all  assigned  usercodes  including  the 
user's  location  and  office  symbol,  and  a  listing  which  de¬ 
scribed  every  system  function  and  the  size  of  the  software 
supporting  it,  and  the  summary  reject  trend  report  which  pro 
vided  statistics  by  user  identifier  on  the  individual  trans¬ 
actions  input  during  the  month  of  February  1981,  and  the 
percent  of  rejects  found  during  the  subsequent  batch  update 
process.  Documentation  of  system  workload  in  several  areas 
are  not  created  at  any  comparable  level  to  these  reports 
for  the  PERSTRANS  function.  Additionally,  the  grouping  of 
as  many  as  forty  functions  tinder  the  "RUN  PROG"  category 
prevents  further  definition  of  user  workload  for  this  inves¬ 
tigation  beyond  the  level  of  the  TERMUSE  report. 
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IV  Distributed  Design  Development 


Introduction 

The  evolution  of  the  design  strategy  employed  to  arrive 
at  the  distributed  system  structure  recommended  for  the  PDS 
central  site  was  perhaps  more  a  result  of  the  current  system 
evolution  than  an  embrace  of  theoretical  concepts.  The  geo¬ 
graphic  dispersion  of  the  PDS  user  community  in  many  instances 
dictated  establishment  of  a  system  node  at  a  certain  location 
although  no  strong  system  function  cohesion  existed.  However, 
the  four  phase  design  approach  presented  by  Palmer  (Ref  12), 
and  discussed  in  Chapter  II  was  employed  to  obtain  a  struc¬ 
tured  methodology. 

The  intent  of  Chapter  III  was  to  present  a  clear  depic¬ 
tion  of  current  system  operations  and  the  user  community 
being  served,  and  to  identify  the  sources  and  limitation  of 
available  system  reports  that  reflected  user  requirements. 
These  requirements,  or  system  functions  actually  employed, 
could  be  utilized  as  one  of  the  important  attributes  for 
developing  the  distributed  design,  as  well  as  a  measure  of 
cluster  cohesion  reflecting  the  strength  of  the  design. 

Having  identified  a  structured  approach  for  the  design  pro¬ 
cess  and  the  information  to  be  employed,  the  bulk  of  the 
design  effort  was  ready  to  proceed. 
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User  Database  Development 

Since  several  sources  of  information  were  available  to 
document  the  system  work  accomplished  by  users,  some  linking 
workload  to  individual  usercodes  and  others  showing  terminal 
identifiers,  a  common  basis  for  analysis  was  required.  The 
basis  selected  was  the  terminal  identifier  primarily  as  a 
recognition  of  the  fluidity  of  the  users  within  office  areas. 
Because  the  present  system  authorizes  system  functions  to 
usercodes,  with  essentially  no  restrictions  based  on  which 
terminals  may  be  used,  users  often  "shop  around"  for  the 
nearest  available  terminal.  A  review  of  the  TERMSTRIP  re¬ 
port  documents  this  user  movement,  showing  transactions  in¬ 
put  by  the  same  user  at  several  different  terminals  on  the 
same  day,  and  the  use  of  one  terminal  by  three  or  four  users 
during  a  one  hour  period.  Since  the  terminal  identifier  is 
the  primary  subject  of  the  TERMUSE  report  containing  the 
functions  employed  and  the  TERMSTRIP  provides  both  the  ter¬ 
minal  and  user  identifiers,  use  of  the  terminal  identifier 
would  ensure  the  user  fluidity  was  included  in  the  database 
reflecting  system  operations. 

Other  important  information  required  for  depiction  of 
system  usage  and  for  maintaining  a  record  of  user  attributes 
were  identified  as  the  various  data  sources  were  reviewed. 

The  final  form  of  the  user/terminal  information  record  in¬ 
cluded  the  terminal  identifier,  office  symbol  or  geographic 
location,  and  the  list  of  system  functions  and  master  files 
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used  at  the  terminal. 

The  user  database  was  created  using  the  software  pro¬ 
gram  described  in  the  next  section  of  this  chapter.  User 
records  were  built  based  on  review  of  the  summary  reports  of 
the  TERMUSE  report  for  March  1981  and  were  later  updated  to 
a  condition  as  of  1  July  1981,  after  validation  of  the  origi¬ 
nal  database  at  AFMPC  in  August.  System  function  usage  in¬ 
formation  was  added  to  each  user  record  from  the  TERMUSE 
report,  while  master  file  requirements  were  added  based  on 
PERSTRANS  transactions  reflected  in  the  TERMSTRIP  reports . 

The  final  version  of  the  database  contained  425  terminal 
records  and  more  than  6000  system  functions  and  master  file 
designations.  The  validation  visit  to  AFMPC  primarily  served 
to  identify  portable  terminals  and  output-only  devices  such 
as  printers  and  punches  which  were  removed  from  the  database. 

Software  Development 

The  iterative  nature  of  the  clustering  process  and  the 
large  number  of  users  and  system  functions  that  would  be  in¬ 
cluded  in  the  database  strongly  suggested  the  requirement  for 
development  of  a  set  of  automated  tools.  The  requirements  to 
be  satisfied  by  these  software  tools  included  the  ability  to 
create  terminal  records  containing  the  attributes  described 
in  the  previous  section,  support  of  database  maintenance, 
and  the  various  types  of  clustering  concepts  to  be  employed 
during  the  design  development  process.  Each  of  the  specific 


41 


capabilities  supported  by  the  software  is  documented  in  the 
program  listing  provided  as  Appendix  B.  The  use  of  the  vari¬ 
ous  clustering  capabilities  are  described,  later  in  this  chap¬ 
ter  as  the  design  process  is  described. 

Partitioning  and  Allocation 

The  partitioning  and  allocation  activities  are  discussed 
together  for  the  subsystem/network  level  as  a  recognition  of 
the  impact  of  resource  constraints  on  the  partitioning  acti¬ 
vity  at  this  level  (Ref  13:23-25).  The  two  overriding  con¬ 
straints  wnich  limited  freedom  in  this  process  were  the  number 
and  location  of  system  nodes  and  the  computer  power  available 
at  the  nodes.  Descriptive  information  obtained  from  sources 
at  AFMPC  which  described  the  purpose  of  each  system  function 
and  the  amount  of  memory  required  for  the  supporting  software 
strongly  influenced  the  final  form  of  the  user  database  de¬ 
scribed  earlier.  The  size  (76,000  lines  of  source  code)  and 
structure  of  the  PROMIS  subsystem  dictated  exclusion  of  users 
primarily  using  this  function.  Some  users  with  small  amounts 
of  recorded  PROMIS  workload  were  retained  if  they  employed 
the  other  system  functions.  The  exclusion  of  these  users  is 
not  meant  to  rule  out  the  possibility  of  applying  distributed 
system  concepts  to  them  in  the  future,  but  rather  recognizes 
the  fact  that  a  considerably  expanded  analysis  of  this  sub¬ 
system  is  required  before  inclusion  in  the  distributed  system 
could  be  evaluated.  The  elimination  of  this  group  of  users 
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established  the  final  form  of  the  user  database  with  425 
terminals  as  mentioned  previously. 

Criteria  Selection  and  Thresholds .  The  criteria  which 
are  used  to  judge  processing  entities  of  the  proposed  system 
must  be  measureable  and  reflect  the  goals  of  the  following 
design  activities.  For  the  PDS,  as  for  any  system  with  a 
wide  dispersion  of  users,  user  location  commonality  must  be 
a  primary  criterion.  In  several  instances,  reflected  in 
Figure  4-1,  groups  of  terminals  can  only  logically  be  allo¬ 
cated  to  the  same  distributed  node  because  of  geographic 
location.  Where  large  groups  of  terminals  were  located  in 
the  same  area  the  other  criteria  would  be  applied  to  establish 
nodes  of  acceptable  size. 

After  application  of  the  locality  criterion,  the  second 
criterion  selected  was  the  similarity  of  system  functions 
employed  at  the  terminals  in  the  area.  The  goal  of  the  allo¬ 
cation  activity  in  applying  this  criterion  would  be  to  obtain 
the  highest  degree  of  cohesion  for  groups  of  terminals  allo¬ 
cated  to  each  distributed  node.  Ancillary  to  the  similarity 
criterion  was  the  establishment  of  similarity  value  thresholds 
for  the  clustering  process  and  identification  of  the  appro¬ 
priate  number  of  terminals  to  allocate  to  each  node  computer. 
As  described  by  a  AFMPC  reacquisition  project  officer  (Ref  5) , 
the  computers  envisioned  for  use  as  distributed  system  nodes 
would  be  capable  of  supporting  up  to  sixteen  terminals  each. 

If  the  system  were  designed  with  each  node  supporting  a  full 
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Fig  4-1.  PDS  Terminal  Locations 


complement  of  sixteen  terminals,  any  new  workloads  or  users 
would  immediately  necessitate  reconfigurations.  Therefore, 
a  target  of  no  more  than  twelve  terminals  at  each  node  was 
selected.  This  means  more  nodes  will  be  required  to  support 
the  terminals  included  in  the  distributed  plan,  but  each  node 
will  be  somewhat  less  expensive,  while  considerable  upward 
flexibility  for  future  system  changes  will  be  provided. 

Additional  partitioning  and  allocation  criteria  which 
may  be  applied  in  the  future,  once  additional  information  is 
available,  are  described  in  Chapter  V. 

Relationship  Evaluation .  The  data  flow  diagram  in 
Figure  4-2,  depicts  the  transformation  of  the  user  database 
into  the  final  distributed  design  consisting  of  46  clusters. 
The  number  of  clusters  or  nodes  created  by  each  transforma¬ 
tion  process  are  included  in  parentheses  on  the  data  streams 
arriving  at  the  final  transform,  "Design  Synthesis."  Each 
of  the  transformations  utilized  one  or  more  of  the  capabili¬ 
ties  provided  in  the  software  developed  as  part  of  this  thesis 
(Appendix  B) . 

The  application  of  the  terminal  location  criterion 
identified  12  terminal  clusters  of  acceptable  size  (not  more 
than  12  terminals),  31  terminals  scattered  to  various  areas 
that  could  not  be  considered  common  to  any  of  the  other  clus¬ 
ters,  and  eight  clusters  with  more  than  twelve  terminals 
each.  Three  of  these  initial  clusters  represented  AFMPC, 

ATC,  and  OCPO  located  at  Randolph  AFB.  Their  transformation 
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Fig  4-2.  Cluster  Development  DFD 


Terminal  ID ' s 

Location  of  User 

*B95 

Army  Personnel  Center,  Alexandria, 

B108 

AFTAC/DPM,  Patrick  AFB,  Fla. 

B115/G206 

Goodfellow  AFB,  Tx. 

*FAO 

Air  National  Guard,  White  Plains,  ] 

*G24/G310 

AFMEA ,  Randolph  AFB,  Tx. 

*G29 

AFCOMS ,  Kelly  AFB,  Tx. 

*G50 

Naval  Observatory,  Washington,  D.C 

G68/G97/G205 

Brooks  AFB,  Tx. 

*G91/G304 

Maxwell  AFB,  Al . 

*G30  2 

Air  University,  Gunter  AFS,  Al . 

G312/G336/G337 

Colorado  Springs,  Col. 

G313/G415 

Norton  AFB,  Ca. 

*G393 

AFIS,  Fort  Ritchie,  Md. 

*G394 

AFIS,  Fort  Belvoir,  Va. 

G395 

AFTEC,  Kirtland  AFB,  N.M. 

G397 

AFISC,  Presidio  of  Monterey,  Ca. 

G398 

AFESC,  Tyndall  AFB,  Fla. 

G422/G502 

Bergstrom  AFB,  Tx. 

*G500/G504 

Dobbins  AFB,  Ga. 

G501 

McClellan  AFB,  Ca. 

G503 

Air  National  Guard,  Camp  Mabry,  Tx 

Fig  4-3.  Dispersed  Terminal  Locations 


into  three  separate  clusters  based  on  locality  was  also  in¬ 
fluenced  by  the  size  of  each  organization's  user  community 
and  their  separate  locations  on  Randolph. 

The  12  clusters  of  acceptable  size  achieved  at  this 
level  of  the  design  process  are  documented  as  the  first  12 
clusters  in  Appendix  C.  Although  cluster  cohesion  based  on 
similarity  values  was  not  a  consideration  in  the  design  of 
these  clusters,  similarity  values  were  computed  with  average 
cluster  similarities  ranging  from  10 OX  to  25%.  This  average 
cluster  similarity  value  is  shown  below  the  similarity  tables 
for  each  cluster  and  was  computed  by  obtaining  the  average 
similarity  of  each  terminal  to  all  other  terminals  in  the 
cluster  and  then  averaging  these  average  values.  In  those 
instances  where  terminals  in  a  cluster  had  no  recorded  work¬ 
load  for  the  collection  period,  the  cluster  similarity  values 
do  not  include  the  impact  from  such  terminals. 

The  remaining  geographical  groups  of  terminals  ranged 
in  size  from  14  at  Wright-Patterson  AFB,  to  168  terminals  at 
AFMPC.  These  groups  became  input  to  the  iterative  similarity 
analysis  process.  Each  iteration  of  this  process  consisted 
of  the  generation  of  a  similarity  table  for  the  cluster  so 
that  users  would  be  included  in  the  table  only  if  their  sim¬ 
ilarity  value  exceeded  the  target  value  used  as  input  for 
the  comparison  run.  The  comparisons  normally  began  using  a 
90X  minimum  similarity  and  succeeding  runs  would  be  made 
using  lower  target  levels  after  removing  newly  allocated 
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terminals.  In  every  case,  clusters  were  being  established 
with  no  more  than  12  terminals  as  required  by  the  criterion 
threshold  established  earlier.  Eventually,  the  number  of  un¬ 
allocated  terminals  would  be  reduced  to  a  number  which  could 
be  allocated  manually,  and  a  single  table  would  be  created 
with  no  lower  limit  on  the  similarity  value  required.  In¬ 
cluded  in  these  groups  of  users  were  eight  which  had  no  re¬ 
corded  workload.  Their  allocation  to  a  system  node  was  based 
on  the  user  location  or  functional  affiliation  such  as  the 
building  location  or  office  symbol.  Thirty-two  additional 
clusters  were  built  using  this  iterative  process,  resulting 
in  a  total  of  44  clusters  created  after  the  application  of 
the  locality  and  similarity  value  criteria.  All  of  these 
clusters  are  also  documented  in  the  listing  of  clusters 
(Appendix  C) ,  with  average  cluster  similarity  values  computed. 

Synthesis 

The  synthesis  portion  of  the  design  process  serves  to 
perform  a  "clean-up"  function  which  identifies  problem  areas 
in  the  design  and  corrects  them.  The  allocation  of  terminals 
with  no  workload  could  be  included  as  part  of  this  activity, 
but  has  been  described  in  the  previous  section  since  the 
locality  criterion  was  met.  The  major  activity  accomplished 
to  synthesize  the  design  involved  analyzing  the  remaining 
unallocated  terminals  and  making  a  determination  as  to  the 
proper  allocation  approach.  The  31  terminals  involved  in 
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the  synthesis  activity  are  listed  in  Figure  4-3,  including 
their  locations.  Their  wide  geographic  dispersion  suggested 
that  proximity  to  a  nearby  cluster  would  be  an  appropriate 
first  step,  much  like  the  initial  allocation  criterion  em¬ 
ployed  previously.  A  total  of  13  of  these  terminals  were 
allocated  based  on  this  locality  criterion.  They  are  indi¬ 
cated  by  the  asterisk  next  to  their  terminal  identifiers  in 
the  figure.  The  remaining  18  terminals  did  not  easily  con¬ 
form  to  this  type  of  allocation  because  there  were  two  or 
three  terminals  in  the  same  area  and  allocation  to  nearby 
nodes  would  increase  the  node  size  beyond  the  threshold. 

These  terminals  were  allocated  to  newly  created  nodes  called 
CENTRAL1  and  CENTRAL 2 .  Their  allocation  between  these  two 
nodes  was  accomplished  by  an  application  of  both  the  locality 
and  similarity  value  criterion  to  achieve  a  "reasonable" 
balance.  The  addition  of  these  last  two  clusters  completed 
the  design  process  with  a  total  of  46  distributed  system 
clusters  supporting  a  total  of  425  terminals. 

Summary 

The  distributed  system  design  envisions  a  total  of  46 
terminal  clusters  supporting  groups  of  terminals  ranging  in 
size  from  3  terminals  at  the  Alaskan  Air  Command,  to  12  ter¬ 


minals  in  a  cluster  at  Andrews  Air  Force  Base.  Although  the 
computed  similarity  values  were  important  to  the  partitioning 
and  allocation  activities,  terminal  locations  overrode  this 


factor  in  many  cases.  Some  of  the  allocation  decisions 
reached  in  the  synthesis  process  may,  under  detailed  analysis 
of  communications  costs/capabilities,  need  to  be  revised. 
Additionally,  expansion  of  the  descriptive  data  available 
for  use  in  the  generation  of  similarity  values,  and  functional 
requirement  changes  which  may  occur  over  time,  will  necessi¬ 
tate  monitoring  and  possible  adjustment  to  the  developed 
cluster  designs. 


V  Conclusions  and  Recommendations 


Conclus  ions 

The  purpose  of  this  investigation  was  to  perform  an 
analysis  of  the  central  site  portion  of  the  Personnel  Data 
System,  propose  a  distributed  processing  system  design  strat 
egy,  and  develop  a  design.  The  analysis  performed  provided 
a  depiction  of  current  system  operations  based  on  the  usage 
of  system  functions  as  portrayed  in  available  system  documen 
tation  reports.  The  use  of  system  functions  was  related  to 
the  input  source  system  terminals,  and  these  relationships 
were  reflected  in  a  user  database  described  in  Chapter  IV. 

The  development  of  a  distributed  processing  system 
design  strategy  evolved  from  a  review  of  recent  publications 
documenting  current  approaches  in  the  subject  area,  and  the 
constraints  imposed  by  current  system  documentation  and  the 
AFMPC  hardware  reacquisition  project.  The  design  strategy 
described  in  Chapter  IV  sought  to  enable  establishment  of 
clusters  of  system  terminals  based  on  a  precedence  applica¬ 
tion  of  three  criteria:  terminal  location,  similarity  of 
system  function  use,  and  office  location/symbol  when  alloca¬ 
tion  could  not  be  supported  by  either  of  the  first  two  cri¬ 
teria.  An  ideal  cluster  size  of  no  more  than  12  terminals 


was  established  as  a  criterion  threshold  which  would  conform 
to  the  equipment  constraints  imposed  by  the  AFMPC  reacquisi¬ 
tion  program  and  provide  future  upgrade  potential  without 


overwhelming  equipment  capabilities. 

Working  from  the  user  database  containing  descriptive 
information  on  425  system  terminals,  12  clusters  based  solely 
on  geographic  location  emerged.  An  additional  32  clusters 
were  built  based  on  a  similarity  analysis  within  location 
groupings.  Two  central  clusters  at  AFMPC  were  established 
to  support  18  terminals  that  were  not  closely  situated  to  any 
other  established  cluster  locations.  The  completed  distri¬ 
buted  system  design  establishes  a  total  of  46  clusters,  each 
described  in  Appendix  B,  including  the  computed  individual 
and  cluster  average  similarity  values. 

General  Recommendations 

One  obstacle  continually  encountered  during  the  analy¬ 
sis  phase  of  the  design  process  was  the  lack  of  necessary 
documentation  required  to  support  a  more  complete  design 
approach.  The  most  complete  source  of  information  to  depict 
system  function  usage  was  the  TERMUSE  report  which  provides 
usage  data  for  14  categories  of  system  functions.  However, 
four  of  these  categories  include  lower  levels  of  usage  which 
are  concealed  from  analysis.  In  the  case  of  the  SURF,  ATLAS, 
and  PERSTRANS  functions,  identification  of  the  specific  data 
file  being  accessed  is  essential  to  an  accurate  portrayal  of 
the  actual  operation  being  performed  by  the  users  of  each 
terminal.  Likewise,  in  the  case  of  the  "RUN  PROG"  function 
category,  more  than  forty  different  system  functions  are 
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actually  concealed  within  this  single  column  of  information. 


The  batch  processing  nature  of  the  PDS  central  site 
removed  several  levels  of  complexity  from  the  design  strategy, 
since  data  concurrency  and  other  problems  related  to  proces¬ 
sing  in  a  "real-time"  distributed  processing  system  did  not 
have  to  be  accommodated.  From  the  literature  review  accom¬ 
plished,  the  approach  employed  in  response  to  these  problems 
for  SDD-1  was  very  impressive.  The  use  of  conflict  graph 
analysis  during  the  design  of  a  system  can  significantly 
reduce  software  overhead  associated  with  ensuring  data  inte¬ 
grity  and  preventing  deadlocks  across  a  distributed  system. 
Serious  consideration  should  be  given  to  producing  transaction 
class  information  in  future  system  development  efforts.  The 
availability  of  such  information  would  be  a  source  of  consis- 
tant  documentation  and  would  serve  as  a  starting  point  for 
application  of  conflict  analysis  should  system  evolution  pro¬ 
ceed  towards  establishment  of  a  real-time  distributed  proces¬ 
sing  system. 

Specific  Recommendations 

Although  the  intent  of  this  investigation  was  to  ulti¬ 
mately  develop  a  distributed  system  design  which  could  be 
implemented  at  AIWPC,  there  is  considerable  additional  work 
that  will  be  required  to  demonstrate  the  validity  of  this  or 
any  other  distributed  design.  The  following  topical  areas 
are  of  major  concern.  They  require  further  analysis  and 


development  to  enable  successful  development  of  a  design 
ready  to  be  implemented. 

System  Report  Requirements .  The  initial  effort  does 
not  directly  relate  to  the  distributed  design.  Expansion  of 
system  reports  to  provide  further  detail  of  system  function 
usage  by  terminal  users  is  essential.  This  documentation 
would  entail  expansion  of  the  TERMUSE  report  to  include  file 
usage  and  further  definition  of  the  functions  currently  in¬ 
cluded  in  the  "RUN  PROG"  category  of  the  report.  This  expan¬ 
sion  will  provide  the  basis  for  generation  of  more  accurate 
representations  of  similarities  in  functional  usage  for  ter¬ 
minals  in  each  cluster. 

Data  Usage  Documentation .  The  second  important  effort 
is  the  requirement  to  improve  the  accuracy  of  the  cluster 
development  process  based  on  inclusion  of  data  usage  similar¬ 
ities  between  the  cluster  members.  This  information  is  only 
partially  available  in  the  current  system  through  the  file 
designation  portion  of  the  transaction  identifiers  shown  in 
the  TERMSTRIP  report.  These  file  designation  were  included 
in  the  system  design  developed,  but  only  reflect  file  usage 
under  the  PERSTRANS  function.  Complete  documentation  of 
file  usage  would  support  creation  of  data  access  matrices 
described  by  Palmer  (Ref  13:25-26).  The  system  design  can 
then  incorporate  the  proximity  measures  supported  by  Palmer. 

Transaction  Analysis  Capabilities .  Each  transaction 
within  the  PDS  must  be  analyzed  to  accomplish  a  thorough 
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conflict  analysis  process.  Identification  of  transaction 
classes  and  placement  of  the  transactions  appropriately  will 
be  a  large  undertaking  due  to  the  size  of  the  system  and  the 
individual  sets  of  transactions  for  each  of  the  subsystems. 
This  process  can  be  standardized  and  expedited  by  development 
of  software  that  can  process  the  data  descriptor  table  infor¬ 
mation  maintained  on  disk  at  AFMPC.  These  tables  could  be 
obtained  in  either  magnetic  tape  or  card  form  very  readily, 
and  could  be  used  as  input  to  routines  which  would  identify 
transaction  attributes  and  assign  each  transaction  to  an 
appropriate  class. 

Network  Analysis.  Communications  within  the  distributed 
processing  system  is  a  vital  concern,  yet  has  been  largely 
ignored  in  this  thesis  because  of  a  lack  of  familiarity  with 
the  subject  and  the  time  required  to  perform  a  reasonable 
network  analysis.  The  only  consideration  of  communications 
emerged  in  the  selection  of  user  location  as  a  primary  cri¬ 
terion  for  cluster  allocations.  Therefore,  a  major  effort 
yet  to  be  accomplished  is  the  analysis  of  the  distributed 
design  on  the  basis  of  network  capabilities  and  costs.  As 
mentioned  earlier  in  Chapter  II,  such  considerations  should 
be  included  in  the  analysis,  partitioning,  allocation,  and 
synthesis  processes  employed  to  achieve  the  distributed 
design. 

System  Simulation.  The  final  recommendation  for  future 
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work  that  would  contribute  to  improvement  and  validation  of 
the  design  strategy  concerns  the  development  of  a  simulation 
package.  Prior  to  starting  this  thesis,  the  author  undertook 
two  simulation  projects  which  sought  to  simulate  central  site 
operations  both  in  the  current  environment,  and  as  the  system 
might  exist  under  a  distributed  processing  system  structure. 
These  efforts  encountered  many  of  the  same  problems  with 
shortcomings  in  detailed  documentation  required  to  perform 
the  level  of  analysis  necessary  for  development  of  the  design 
strategy.  Information  to  show  current  system  performance 
characteristics  are  not  readily  available  and  would  require 
the  installation  of  software  traps  to  collect  data  in  most 
cases.  Such  traps  could  be  developed  and  used  periodically 
with  the  assistance  of  system  managers  at  AFMPC,  but  the 
benefits  to  be  derived  must  justify  such  changes. 

Any  simulation  package  developed  will  be  subject  to 
the  problems  of  size  since  the  transaction  processing  por¬ 
tion  of  the  system  involves  many  files,  several  hundred  ter¬ 
minals,  and  numerous  unique  application  programs  which  con¬ 
tinually  compete  for  system  resources.  Some  of  the  specific 
information  requirements  identified  as  necessary  for  develop¬ 
ment  of  a  simulation  package  include  the  following: 

-  Processing  times  (CPU)  required  for  the  various  seg¬ 
ments  of  transaction  processing  activities  must  be  obtained 
for  the  various  application  programs. 

-  Transaction  input  rates,  reject  rates,  and  re-input 
rates  for  system  users  must  be  collected  and  measured  for 
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each  of  various  system  functions  based  on  such  input  (i.e. 
SURF,  PERSTRANS,  and  ATLAS) . 


-  Development  of  descriptions  of  the  user/system  inter¬ 
action  for  the  various  system  functions. 

-  Resource  requirements  and  utilization  statistics  for 
the  various  system  functions . 

Summary 

The  distributed  system  design  developed  is  an  initial 
effort  towards  development  of  a  design  which  could  be  imple¬ 
mented.  The  size  of  the  current  system,  the  need  for  expan¬ 
sion  of  descriptive  system  reports,  and  collection  of  detailed 
system  statistics  to  support  simulation  of  system  performance 
to  validate  the  design,  provides  the  opportunity  for  continu¬ 
ation  of  this  project  in  a  rapidly  emerging  technological 
area.  Early  contact  with  AFMPC/MPCD,  the  sponsor,  must  be 
established  to  develop  a  strategy  for  implementing  the  system 
information  sources  required.  Additionally,  any  individual 
interested  in  undertaking  the  project  should  have  experience 
with  an  available  simulation  language  such  as  Q-GERT  or  SLAM. 
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Appendix  A 

Central  Site  Environment 


The  central  site  consists  of  two  B6700  computers  (A 
system  and  B  system)  and  one  H6068  computer  (C  system) . 

1.  The  A  system  consists  of  the  following  components: 

a.  3  central  processors 

b.  960K  memory 

c.  4  data  communications  processors 

d.  14  tape  drives 

e.  136  disk  drives  (dedicated,  switchable,  and 
shared) 

2.  The  B  system  consists  of  the  following  components: 

a.  3  central  processors 

b.  786K  memory 

c.  2  data  communication  processors 

d.  15  tape  drives 

e.  162  disk  drives  (dedicated,  switchable,  and 
shared) 

NOTE:  Total  disk  drives  =  236 

3.  The  C  system  consists  of  the  following  components: 

a.  1  central  processor 

b.  256K  memory 

c.  1  data  communications  processor 

d.  5  tape  drives 

e.  14  disk  drives 

A  total  of  502  permanent  terminals  were  identified 
from  system  documentation  gathered  for  use  in  this  thesis. 

Of  this  number,  77  terminals  were  employed  by  PROMIS/PMS 
users  who  were  excluded  from  the  distributed  system  design. 

An  additional  55  terminals,  not  part  of  the  502,  are  portables 
which  could  not  be  reliably  reflected  as  belonging  to  any 
specific  user  consistently. 
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APPENDIX  B 


Database  So-ftware  Listing 


<**R-  *> 

PROGRAM  MANAGER (INPUT/, OUTPUT, DATAF I L,MTR I XFIL)  ; 

<«*****« ***«*t*ttt ****** «M  *******************  ****** «***«**) 

<* 

it  PROGRAM  MANAGER  WAS  DEVELOPED  BY  CAPT.  RONALD  V.  BRANDT 
<*  AT  THE  AIR  FORCE  INSTITUTE  OF  TECHNOLOGY  AS  SUPPORTING 
(*  SOFTWARE  FOR  THE  DEVELOPMENT  OF  A  DISTRIBUTED  PROCESSING 
<*  SYSTEM  DESIGN  THESIS  PROJECT. 

<* 

it  FUNCT I ON i  THIS  PROGRAM  SUPPORTS  THE  CREATION,  MAINTENANCE, 
<*  AND  ANALYSIS  OF  A  DATABASE.  AS  PRESENTED,  THIS  DATABASE 
<*  CONSISTS  OF  USER/TERMINAL  RECORDS  WHICH  ARE  DEFINED  BY 
<*  THE  SYSTEM  TRANSACT I ON/ FUNCT I ON  RECORDS  LINKED  TO  IT. 

<*  THE  MENU  OF  AVAILABLE  PROGRAM  COMMANDS  ARE  LISTED  BELOW r 
(* 

<*  READ  INPUT  SUBCLUSTER 

<*  SAVE  DATA  CLUSTER 

(*  ADD  USER  COMPARE 

<*  DEL  USER  SORT  USERS 

<*  ADD  TRANS  ADD  OFFICE 

<*  DEL  TRANS  END 

<* 

<*  DOCUMENTATION*  EACH  OF  THE  COMMANDS  ARE  DOCUMENTED  AT 
<*  THE  BEGINNING  OF  THE  APPROPRIATE  SECTION  OF  CODE. 
it 

<*  INPUT*  THIS  PROGRAM  OPERATES  IN  AN  INTERACTIVE  MODE 
<*  BY  PRESENTING  THE  USER  WITH  THE  MENU  LIST  AND  PROMPTING 
<*  ANY  INPUT  REQUIRED.  THE  USE  OF  THIS  PROGRAM  REQUIRES 
<*  THE  PRESENCE  OF  A  LOCAL  FILE  CALLED  ' DATAF IL'  OR  INITIAL 
(*  CREATION  OF  THE  FILE  THROUGH  USE  OF  THE  ' ADD  USER',  7  ADD 
<*  TRANS',  AND  THE  'SAVE  DATA'  COMMANDS. 

<* 

it  OUTPUT*  THREE  TYPES  OF  OUTPUTS  OCCUR  AS  A  RESULT  OF  THE 
<*  USE  OF  THIS  PROGRAM.  INTERACTIVE  MESSAGES  ARE  DISPLAYED 
<*  ON  THE  USERS  TERMINAL;  OUTPUT  OF  THE  FILE  'DATAFIL' 
it  IS  ACCOMPLISHED  UPON  INPUT  OF  THE  'SAVE  DATA'  COMMAND* 
it  AND  OUTPUT  OF  THE  FILE  ' MTRIXFI '  OCCURS  AS  A  RESULT  OF 
<*  INPUT  OF  THE  'COMPARE'  COMMAND. 

(# 

(**ttttttttttttttttttttttttttttttttttttttttttttttttttttt%ttt) 
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LABEL 

10  ,  20,  80,  85, 
75,  99  ; 


TYPE 

TRANSRECORD  =  •''TRANSENTRY  ; 
TRANSENTRY  =  RECORD 
TRANSNAME  :  ALFA  ; 
NEXTTRANS  :  TRANSRECORD  ; 
END  ; 

USERRECORD  =  -USERENTRY  ; 
USERENTRY  =  RECORD 
NAME  :  ALFA  ; 

NEXT  s  USERRECORD  ; 

OFFICE  :  ALFA  ; 

USERTRANS  :  TRANSRECORD  5 
TRANSCOUNT  :  INTEGER  ; 

END  ; 


VAR 

MTRIXFIL  , 

DATAFIL  :  TEXT  ; 

TRANSPOINTER  :  TRANSRECORD  ; 

USERPO INTER, 

USERHEAD  s  USERRECORD  ; 

CHANGE  :  BOOLEAN  ; 

COMMAND, 

OFF ICE ID  , 

USERID, 

TRANS ID  :  ALFA  ; 

CLUSTER  s  ARRAY Cl. .2003  OF  USERRECORD  ; 
USERCOUNT  :  INTEGER  ; 


■*  p- 
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<**********************************************************) 

<* 

<*  PROCEDURE  GETFROMFILE  IS  USED  TO  READ  THE  NEXT  ALFA  ENTRY 
<*  FROM  THE  INPUT  FILE  "DATAFIL. "  EACH  ALFA  ENTRY  CONSISTS 
<*  OF  TEN  CHARACTERS,  AND  THE  TEN  CHARACTERS  READ  ARE  RE- 
(*  TURNED  TO  THE  CALLER  THROUGH  THE  VARIABLE  INCHAR.  CALLING 
<*  FORMAT  s  GETFROMFILE (X ) j  WHERE  X  HAS  BEEN  DECLARED  AS  A 
(*  VARIABLE  OF  DATA  TYPE  ALPHA. 

<* 

(*********************************:M***********************) 

PROCEDURE  GETFROMF I LE ( VAR  INCHAR  *  ALFA)  j 
VAR 

I  :  INTEGER  ; 

EMPTY  s  ALFA  ; 

BEGIN 

EMPTY  s=  ’  J  5 

IF  EOLN< DATAFIL)  THEN  BEGIN 
RE ADLN (DATAFIL)  ; 

INCHAR  :=>  EMPTY  ; 

END 

ELSE  BEGIN 

FOR  I  s-  1  TO  10  DO 

IF  EOLN (DATAFIL)  THEN  INCHARC 1 1  :=  ?  » 

ELSE  READ  (DATAFIL,  I NCHARC  ID  ; 

END  ; 

END  5 

^  ^  '  1  -  -J*  J.  ^  J*  ^  *  If  ^  ii>  ^  ^  X  |J|  ^  ^  ^  ^  ^  ^  ^  <lr  ^  ^Lf  -if  ^  I  i »  lb  J*  *b  ^  ^  -  >1.  -I-  ^  ^  -b  (K  ^  lb  ^  ab  ^  <b  lb  ,1  •  ^  ^  ^  >b  .A.  \ 

^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ®  ^  ^  ^  ^p  ^  x  ^  ^  ^  ip  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ®  ^  ^p  ^  ^ 

(* 

(*  PROCEDURE  GETALFA  IS  USED  TO  READ  AN  ALFA  ENTRY  TYPED  IN 
(*  OVER  THE  KEYBOARD  CALLING  FORMAT  :  GETALFA ( X > ;  WHERE  X 
<#  HAS  BEEN  DECLARED  AS  A  VARIABLE  OF  DATA  TYPE  ALFA. 

(* 

(  t  )| (  ]|(  )|C  ][[  ]|[  ]|(  )||  ^  j||  ^  i|r  <|r  ^  ^  \^r  ^  ^  ^  ^  ^  ^  ^  ^  ^  y 

PROCEDURE  GETALFA (VAR  INALFA  :  ALFA)  ; 

VAR 

I  :  INTEGER  ; 

BEGIN 

FOR  I  : -  1  TO  10  DO 

IF  EOLN (INPUT)  THEN  INALFAC 1 1  : ■  ’  ’ 

ELSE  READ  ( INALFAL 13)  ; 

9  * 


END 


(************************************************************) 

<* 

(*  PROCEDURE  GETCOMMAND  DISPLAYS  ON  THE  TERMINAL  THE  MENU 
<*  OF  AVAILABLE  PROGRAM  COMMANDS  AND  CALLS  PROCEDURE  GET ALFA 
<*  TO  GET  THE  USERS  SELECTED  COMMAND. 

<*  CALLING  FORMAT  :  GETCOMMAND ( X ) 5  WHERE  X  HAS  BEEN  DECLARED 
U  AS  A  VARIABLE  DATA  TYPE  ALFA. 

<* 

(************ ************************************************> 


COMMAND  -  ADD 
-  ADD 


PROCEDURE  GETCOMMAND (VAR  NEWCOMMAND 
BEGIN 

WRITELN ( ’  ENTER 
WRITELN ( ' 

WRITELN ( ’ 

WRITELN (’ 

WRITELN (’ 

WRITELN ( ’ 

WRITELN ( ’ 

WRITELN (’ 

WRITELN <’ 

WRITELN (’ 

WRITELN ( ’ 

WRITELN <’ 

WRITELN  ; 

READLN  ; 

GET ALFA (NEWCOMMAND) 

END  ; 


ALFA) 


USER’ )  5 
TRANS' ) 

-  READ  INPUT' ) 

-  SAVE  DATA' ) 

-  ADD  OFFICE' ) 

-  DEL  TRANS' ) 

-  DEL  USER' )  5 

-  END' )  ; 

-  CLUSTER’ )  ; 

-  SUBCLUSTER’ ) 

-  SORT  USERS' ) 

-  COMPARE’ )  ; 


(  t  *  ](C  j|t  i  1 J(£  *  ijl  ^  Jj!  i|t  *(E  <)(  <|C  !(C  jft  ]|(  !|C  ]j[  j|C  1|C  lit  5jt  ijC  3JC  I(£  l|[  3(C  3|t  3(C  3(i  3(C  ) 

(* 

(*  PROCEDURE  PUTALFA  WRITES  TO  THE  TEXTFILE  DATAFIL  THE  ALFA 
<*  DATA  ITEM  PASSED  TO  IT  IN  THE  CALL. 

(*  CALLING  FORMAT  :  PUTALFA (X)  ;  WHERE  X  HAS  BEEN  DECLARED 
<*  AS  A  VARIABLE  OF  DATA  TYPE  ALFA. 

(* 

(it*****,**##******#**#*****##*#**#****************#*****#***#**) 


PROCEDURE  PUTALFA (OUT ALFA  :  ALFA)  ; 
VAR 

I  :  INTEGER  ; 

BEGIN 

WR I TE ( DATAF I L ,  OUTALFA )  j 
END  5 
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(******************4 ^******************#****#***************) 
(* 

(*  PROCEDURE  SAVEDATA  WRITES  THE  CONTENTS  OF  THE  USER  DATA- 
<*  BASE  TO  THE  OUTPUT  FILE  DATAFIL.  OUTPUT  BEGINS  WITH  THE 
<»  USER  POINTED  TO  BY  USERHEAD  AND  CONTINUES  UNTIL  ALL  USERS 
<*  AND  THEIR  DATA  HAVE  BEEN  WRITTEN.  TO  ALLOW  DISPLAY  OF  THE 
(*  OUTPUT  FILE  ON  A  TERMINAL  SCREEN,  EACH  LINE  OF  THE  FILE 
<*  HAS  NO  MORE  THAN  SO  CHARACTERS  OF  DATA.  THE  NAME, 

(*  OFFICE  ID  (IF  FIRST  LINE  FOR  THAT  USER),  THE  COUNT  OF  THE 
<*  NUMBER  OF  SYSTEM  FUNCTIONS/ TRANSACT IONS  ON  THE  REMAINDER 
(*  OF  THE  LINE,  AND  THE  FUNCTION  OR  TRANSACTION  ID’S.  A 
(*  COUNT  OF  THE  NUMBER  OF  USERS  WRITTEN  TO  THE  FILE  IS  OUT- 
(*  PUT  TO  THE  TERMINAL  AT  THE  COMPLETION  OF  THE  SAVE  PRQ- 
(*  CEDURE.  CALLING  FORMAT  :  SAVEDATA  ; 

<* 

(fc*#***##*:^***#*****#***#*#****#**###**#*******#*****##*#*#) 

PROCEDURE  SAVEDATA  ? 

CONST 

MAXCOUNT  =  5  ; 

VAR 

CH  :  CHAR  ; 

TEMPCOUNT ,  J  s  INTEGER  ; 

TEMPUSER,  CURRENTUSER  :  USERRECORD  ; 

CURRENTTRANS  :  TRANSRECORD  ; 

BEGIN 

REWRITE (DATAFIL)  ; 

CH  ’  ’  ; 

CURRENTUSER  :=  USERHEAD  ? 

WRITELN ( ’  INCOMING  USER  COUNT:  ’ . USERCOUNT  :  3)  5 
USERCOUNT  : =  0  ; 

WHILE  CURRENTUSER  <>  NIL  DO 
BEGIN 

TEMPUSER  :=  CURRENTUSER  ; 

USERCOUNT  :=  USERCOUNT  +  1  ; 

TEMPCOUNT  :=  CURRENTUSER  ''.  TRANSCOUNT  ; 

CURRENTTRANS  :  =  TEMPUSER"".  USERTRANS  ; 

WRITE  < D AT AF I L ,  CURRENTUSER'" .  NAME )  ; 

WR I TE  ( DAT  AF  I L ,  CURRENTUSER""' .  OFF  ICE)  ; 

IF  TEMPCOUNT  =  0  THEN  WRITE (DATAFIL, ’  0  ’)  ; 

WHILE  TEMPCOUNT  >  0  DO  BEGIN 
IF  TEMPCOUNT  >  MAXCOUNT  THEN 
BEGIN 

TEMPCOUNT  :=  TEMPCOUNT  -  MAXCOUNT  ; 

WR I TE ( DAT AF I L ,  MAXCOUNT:  1)  ; 

FOR  J  :=  1  TO  MAXCOUNT  DO  BEGIN 

WRITE  ( DATAF I L ,  CURRENTTRANS""- .  TRANSNAME )  ; 
CURRENTTRANS  :=  CURRENTTRANS-"".  NEXTTRANS  ; 
END  1 

WRITELN (DATAFIL)  ; 

WR  I  TE  ( D  AT  AF  I L ,  CURRENTUSER"'" .  NAME )  ; 

END 
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ELSE  BEGIN 

WRITE (DATAFIL, '  ' , TEMPCOUNT s 1 , '  ’)  j 
FOR  J  s»  1  TO  TEMPCOUNT  DO  BEGIN 

WRITE  (DATAFIL,  CURRENT  TRANS'".  TRANSNAME) 
CURRENTTRANS  :  =  CURRENTTRANS/V.  NEXTTRANS 
END  ; 

TEMPCOUNT  s=  0  j 
END  ;  <*  ELSE  *> 

END  ;  <*  WHILE  *) 

CURRENTUSER  :=  TEMPUSER--.  NEXT  ; 

WRITELN (DATAFIL)  ; 

END  t 

WRITELN (DATAFIL, J  END  ? )  ; 

RESET (DATAFIL)  ; 

WRITELN (’  OUTPUT  USER  COUNTS  ’ , USERCOUNT  :  3)  5 
END  ; 
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I 


(***»  ***********  *  *****  **********  *  *  *  *  lie  ************  *******  *»**) 
<* 

(*  PROCEDURE  GETOFFICEID  IS  USED  TO  REQUEST  INPUT  OF  A  USERS 
<*  OFF ICE ID  FROM  THE  TERMINAL. 

<*  CALLING  FORMAT:  GETOFFICEID < X)  s 

<*  WHERE  X  HAS  BEEN  DECLARED  AS  A  VARIABLE  OF  DATA  TYPE  ALFA. 
(* 

(******************* ****** ******* ****** ******* ****** ********) 

PROCEDURE  GETOFFICEID (VAR  NEWQFFICEs ALFA)  ; 

BEGIN 

WRITELN ( '  ENTER  OFFICE  ID  OR  END  ’  )  } 

READLN  j 

GET  ALF  A ( NE WOFF ICE)  5 
END  ; 

<************* **********************************************) 
(* 

(*  PROCEDURE  GETUSERID  IS  USED  TO  REQUEST  INPUT  OF  A  USERID 
(*  FROM  THE  TERMINAL.  CALLING  FORMAT  !  GETUSERID (X)  ;  WHERE 
<*  X  HAS  BEEN  DECLARED  AS  A  VARIABLE  DF  DATA  TYPE  ALFA. 

<* 

<***********************************************************) 

PROCEDURE  GETUSERID (VAR  NEWUSER  :  ALFA)  ; 

BEGIN 

WRITELN ( J  ENTER  USER  ID  OR  END'); 

READLN  ; 

GET ALF A (NEWUSER)  5 
END  ; 
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(ttt*tt******t%*t*t*********tt* r* *******#**********#********> 

<* 

<*  FUNCTION  DUPLICATEUSER  IS  A  BOOLEAN  FUNCTION  USED  TO  DETER- 
<*  MINE  IF  THE  USERID  PASSED  TO  THE  FUNCTION  IS  CURRENTLY 
<*  PART  OF  THE  USER  DATABASE.  IF  A  MATCH  OF  THE  CHECK ID  AND 
<*  CURRENT  USERID  IS  FOUND,  DUPLICATEUSER  IS  SET  TRUE  AND 
<*  THE  RECORD  POINTER  IS  RETURNED  POINTING  TO  THE  MATCHED 
<*  RECORD.  CALLING  FORMAT  :  IF  DUPLICATEUSER  <X,Y>  THEN  | 

(t  WHERE  X  IS  A  VARIABLE  OF  DATA  TYPE  ALFA  CONTAINING  THE 
<*  USERID  TO  CHECK ID,  AND  Y  HAS  BEEN  DECLARED  AS  A  VARIABLE 
<*  OF  DATA  TYPE  USERRECORD. 

<* 

<###*#*##*####*##*#*##*##*#*##*#***##**##****** *#** #*###*#**) 

FUNCTION  DUPLICATEUSER (CHECKID: ALFA; VAR  USERNAME: USERRECORD) 

s BOOLEAN  ; 

VAR 

TEMPCHECK  :  BOOLEAN  ; 

BEGIN 

TEMPCHECK  :=  FALSE  ; 

IF  USERHEAD  <>  NIL  THEN  BEGIN 
USERNAME  :=  USEF<HEAD  ; 

TEMPCHECK  :=  USERNAME'-.  NAME  =  CHECK  ID  ; 

WHILE  (USERNAME'".  NAME  <>  CHECKID)  AND 
(USE,  ,.:W'.NEXT  <>  NIL)  DO 
BEGIN 

USERNAME  :=  USERNAME' . NEXT  5 
TEMPCHECK  :=  USERNAME'- .  NAME  =  CHECKID  ; 

END  ; 

END  ,- 

DUPLICATEUSER  :=  TEMPCHECK  ; 

END  5 


(**t****************** **************** **********************) 
<* 

<*  PROCEDURE  ADDOFFICE  INSERTS  THE  OFF ICE ID  (PASSED  TO  IT) 

<*  INTO  ALL  USER  RECORDS  SPECIFIED  FROM  THE  TERMINAL. 

<*  TERMINAL  INPUT  OF  USERID  IS  REQUESTED  AND  THE  OFF ICE ID  IS 
<*  INSERTED  UNTIL  THE  COMMAND  "END”  IS  INPUT. 

<*  CALLING  FORMAT  :  ADDOFFICE (X)  ;  WHERE  X  IS  THE  OFF ICE ID 
<*  VARIABLE  OF  DATA  TYPE  ALFA. 

(* 

(************************************ t**********************) 

PROCEDURE  ADDOFFICE (USEROFFICEs ALFA)  ; 

BEGIN 

WRITELN  <  *  ENTER  USERID  FOR  OFFICE,  OR  END  *>  ; 

READLN  ; 

GETALFA (USERID)  ; 

WHILE  NOT  (USERID  =  J END  ')  DO  BEGIN 

IF  (DUPLICATEUSER (USERID, USERPOINTER) )  THEN 
USERPO I NTER'"- .OFFICE  :=  USEROFFICE  ; 

WRITELN ( ’  ENTER  USERID  FOR  OFFICE,  OR  END  ’)  ; 

READLN  ; 

GETALFA (USERID)  ; 

END  ; 

END  ; 

(************** ^ t *********  ******** **************************) 
(* 

(*  PROCEDURE  DELETEUSER  IS  USED  TO  REMOVE  A  USER  RECORD 
<*  FROM  THE  USER  DATABASE. 

(*  CALLING  FORMAT  :  DELETEUSER ( X >  ;  WHERE  X  IS  A  VARIABLE  OF 
(*  DATA  TYPE  ALFA  CONTAINING  THE  USERID  TO  BE  DELETED. 

(* 

(t^t*****t**t******. *********************************** ******) 


PROCEDURE  DELETEUSER (OLDUSER  :  USERRECORD)  ; 
VAR 

TEMPUSER  :  USERRECORD  j 
BEGIN 

TEMPUSER  s=  USERHEAD  ; 

IF  TEMPUSER  =  OLDUSER  THEN  BEGIN 
USERHEAD  TEMPUSER'-.  NEXT  ; 

DISPOSE (OLDUSER)  ; 

END 

ELSE  BEGIN 

WHILE  (TEMPUSER'  .  NEXTOOLDUSER)  AND 
(TEMPUSER''.  NEXT  <>  NIL)  DO 
TEMPUSER  :=  TEMPUSER^. NEXT  5 
IF  TEMPUSER-''.  NEXT  =  OLDUSER  THEN  BEGIN 
TEMPUSER-'  .  NEXT  :=  OLDUSER''.  NEXT  5 
D I SPOSE ( OLDUSER )  ; 


END 


<#  IF  *) 
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■%  * 


< **************************************************** ***«*«) 
<* 

<«  PROCEDURE  BUILDUSER  CREATES  USER  RECORDS  FOR  THE  USERIDS 
<*  BROUGHT  TO  THE  PROCEDURE,  AND  LINKS  THE  RECORD  INTO  THE 
<*  DATABASE.  THE  VARIABLES  USERPO INTER  AND  USERHEAD  POINT 
<t  TO  THE  NEW  RECORD  AT  THE  COMPLETION  OF  THE  PROCEDURE. 

<*  CALLING  FORMAT  t  BUILDUSER (X)  %  WHERE  X  IS  A  VARIABLE  OF 
(*  DATA  TYPE  ALFA  CONTAINING  THE  NEW  USERID. 

(* 

ft*********************************************************) 

PROCEDURE  BUILDUSER < NEW ID  a  ALFA)  » 

BEGIN 

NEW ( USERPO I NTER )  j 

USERPO INTERS. NEXT  a-  USERHEAD  » 

USERHEAD  i»  USERPO I NTER  | 

USERPO INTERS. NAME  »-  NEW ID  | 

USERHEAD^. USERTRANS  I-  NIL  } 

USERHEAD''. TRANSCOUNT  i»  0  | 

USERHE AD A. OFFICE  a-  ’  ’  » 

USERHEAD'' . CLUSTERMEMBER  a-  FALSE  j 
END  i 

(**********************************************************) 

(* 

<*  PROCEDURE  BUILDTRANS  CREATES  NEW  TRANSACTION  RECORDS, 

<*  AND  LINKS  THEM  TO  THE  USER  RECORD  IN  PARENTPOINTER.  THE 
<*  TRANSACTION  RECORD  IS  POINTED  TO  BY  THE  GLOBAL  VARIABLE 
<*  TRANSPOINTER  UPON  COMPLETION  OF  THE  PROCEDURE. 

(*  CALLING  FORMAT  s  BUILDTRANS < X . Y)  ;  WHERE  X  IS  AN  ALFA 
<*  VARIABLE  CONTAINING  THE  TRANSACTION  ID,  AND  Y  IS  OF 
<*  TYPE  USERRECORD  POINTING  TO  THE  USER  FOR  THIS  TRANS- 
<*  ACTION. 

<* 

<**********************************************************) 

PROCEDURE  BU I LDTRANS <  ADDTRANS s  ALFA ; PARENTPO I NTER  s  USERRECORD ) 
BEGIN 

NEW (TRANSPOINTER)  ; 

TRANSPOINTER''-.  TRANSNAME  :  =*  ADDTRANS  ; 

TRANSPOINTER' .NEXTTRANS  s=  PARENTPOINTER'' .  USERTRANS  j 
PARENTPOINTER'. USERTRANS  a*  TRANSPOINTER  5 
PARENTPO  I  NTER-'  .  TRANSCOUNT  a  “PARENTPO  I  NTER  ' .  TRANSCOUNT  + 1  a 
END  { 
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<***********************************************************) 

(* 

<*  PROCEDURE  SORTUSERS  IS  USED  TO  SORT  THE  USER  DATABASE  IN 
($  ASCENDING  ORDER  BASED  ON  USERID  (NAME) .  THE  ALGORITHM 
<*  IMPLEMENTED  IS  AN  EXCHANGE  SELECTION  SORT.  THIS  PROCEDURE 
<*  IS  INVOKED  BY  INPUT  OF  THE  "SORT  USERS"  COMMAND. 

<*  CALLING  FORMAT  s  SORTUSERS  ; 

<* 

*******************************%***************.*****.**) 

PROCEDURE  SORTUSERS  j 
VAR 

PREVPOINTER  , 

TOPPO INTER,  CURRPO INTER  :  USERRECORD  ; 

CHANGE  :  BOOLEAN  ; 

PROCEDURE  FINDSMALLEST  ; 

BEGIN 

PREVPOINTER  :=  USERPO INTER  ; 

WHILE  CURRPO INTER  <>  NIL  DO  BEGIN 

IF  (CURRPO INTER'. NAME  <  USERPO INTERS. NAME)  THEN  BEGIN 
TOPPO  INTER-'.  NEXT  :=  CURRPO  INTER  ; 

PREVPOINTER-'-.  NEXT  :=  CURRPO  INTER-'.  NEXT  ; 

CURRPO INTER-. NEXT  s=  USERPO INTER  ; 

CURRPO INTER  :=  USERPO INTER  ; 

USERPOINTER  :=  TOPPO INTER-. NEXT  ; 

CHANGE  • —  TRUE  ; 

END 

ELSE  PREVPOINTER  :=  CURRPO INTER  ; 

CURRPO  INTER  :=  CURRPO  INTER'.  NEXT  ; 

END  ;  <*  WHILE  *) 

END  ;  (*  FINDSMALLEST  *) 

BEGIN 

TOPPOINTER  :=  USERHEAD  ; 

CURRPO  INTER  j=  USERHEAD  '-.  NEXT  ; 

IF  (CURRPOINTER '.NAME  <  USERHEAD-. NAME)  THEN  BEGIN 
USERHEAD  :=  CURRPOINTER  ; 

TOPPOINTER-. NEXT  :=  CURRPOINTER-. NEXT  ; 

CURRPO IN  TER-. NEXT  :=  TOPPOINTER  ; 

END  ; 

TOPPOINTER  :=  USERHEAD  ; 

USERPOINTER  :=  USERHEAD-. NEXT  ; 

CURRPOINTER  USERPOINTER-. NEXT  : 

FINDSMALLEST  ; 

IF  (TOPPOINTER-. NAME  >  USERPOINTER-. NAME)  THEN  BEGIN 
USERHEAD  j=  USERPOINTER  ; 

TOPPOINTER-. NEXT  :=  USERPOINTER-. NEXT  ; 

USERPOINTER-. NEXT  TOPPOINTER  ; 

END  ; 

TOPPOINTER  :=  USERHEAD  ; 

USERPOINTER  :=  USERHEAD-. NEXT  ; 


CURRPO INTER  :=  USERPOINTER"'.  NEXT  ; 
CHANGE  i=  TRUE  ; 

WHILE  (CURRPO INTER'''.  NEXT<  >NIL)  DO  BEGIN 
CHANGE  1=  FALSE  ; 

FINDSMALLEST  ; 

TOPPO INTER  :=  TOPPO INTER"'. NEXT  j 
USERPO INTER  s*  TOPPO INTER". NEXT  ; 
CURRPO  INTER  s=  USERPO  INTER"'.  NEXT  ; 

END  5 
END  ; 
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(*************4 c  ***************************#***##**###*******) 
<* 

(*  PROCEDURE  READOLDFILE  IS  USED  TO  READ  THE  EXISTING  USER 
(*  DATABASE  FROM  FILE  DATAFIL.  AND  CREATE  THE  LINKED  DATA 
<*  STRUCTURE  EMPLOYED  FOR  FILE  MAINTENANCE.  THE  NUMBER 
<*  OF  USER  RECORDS  READ  IN  IS  DISPLAYED  ON  THE  TERMINAL  UPON 
<*  COMPLETION.  CALLING  FORMAT  :  READOLDFILE  ; 

<*  PROCEDURE  IS  USED  IN  RESPONSE  TO  THE  "READ  INPUT"  COMMAND. 

<t 

(************************************************ ***#**#*#**) 

PROCEDURE  READOLDFILE  ; 

LABEL  40,50  ; 

VAR  CURRENTUSER  :  ALFA  5 
J  :  INTEGER  ; 

CH  :  CHAR  ; 

TEMPCOUNT  :  INTEGER  ; 

BEGIN 

RESET (DATAFIL)  ; 

USERCOUNT  : =  O  ; 

CH  :=  '  ’  ; 

CURRENTUSER  :=  •’  ’  S 

WHILE  NOT  EOF (DATAFIL)  DO 
BEGIN 

GETFRQMF I LE ( USER ID)  ; 

IF  USERID  =  CURRENTUSER  THEN  GOTO  40  ; 

CURRENTUSER  :=  USERID  ; 

IF  USERID  =  *  END  '  THEN  BEGIN 

WRITELN ( ’  REACHED  END  OF  INPUT  FILE.’)  ; 

GOTO  50  ;  END 
ELSE  BUILDUSER (USERID)  ; 

USERCOUNT  :=  USERCOUNT  +  1  ; 

GETFROMF I  LE  <  USERPO  INTER'"- .  OFF  ICE)  ; 

40:  READ (DATAFIL, CH, TEMPCOUNT)  ; 

IF  TEMPCOUNT  <>  0  THEN  READ (DATAFIL, CH)  ; 

WHILE  TEMPCOUNT  <>  0  DO  BEGIN 
TEMPCOUNT  :=  TEMPCOUNT  -  1  ; 

GETFROMF ILE (TRANS  ID)  ; 

BU I LDTRANS ( TR ANS I D , USERPO I NTER )  ; 

END  a 

READLN (DATAFIL)  ; 

END  ; 

50;  WRITELN (’  COUNT  OF  INCOMING  USERS:  USERCOUNT  :  3)  : 

END  5 
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(********************************** *************************) 
<* 

<*  PROCEDURE  GETTRANSID  REQUESTS  INPUT  OF  A  TRANSACTION 
<*  IDENTIFIER  AND  INVOKES  PROCEDURE  GETALFA  TO  READ  KEYBOARD 
<*  INPUT  INTO  THE  ALFA  VARIABLE  NEWTRANS. 

<*  CALLING  FORMAT:  GETTRANS ID ( X ) 

<*  WHERE  X  IS  A  VARIABLE  OF  DATA  TYPE  ALFA. 

(* 

<##***#**#**#***#*##*#*#***)t^#***************#****#*********) 

PROCEDURE  GETTRANSID (VAR  NEWTRANS  s  ALFA)  ; 

BEGIN 

WRITELN ( *  ENTER  TRANSACTION  ID  OR  "END"’)  ; 

READLN  ; 

GETALFA (NEWTRANS)  ; 

END  ; 

<* 

(*  BOOLEAN  FUNCTION  DUPLICATETRANS  IS  USED  TO  DETERMINE 
<*  WHETHER  THE  USER  POINTED  TO  BY  THE  VARIABLE  BROUGHT  TO 
(*  THE  FUNCTION  CURRENTLY  HAS  THE  TRANSACTION  IDENTIFIER  ALSO 
(*  BROUGHT  TO  THE  FUNCTION. 

<*  CALLING  FORMAT  :  IF  DUPL ICATETRANS ( X , Y)  THEN  ;  WHERE  X  IS 
<*  AN  ALFA  VARIABLE  CONTAINING  THE  TRANSACTION  IDENTIFIER 
(*  TO  BE  CHECKED  FOR,  AND  Y  POINTING  TO  THE  USER  RECORD  TO 
(*  BE  SEARCHED. 

(* 

FUNCT I ON  DUPL I CATETRANS ( CHECK I D : ALFA ; CURRUSER : USERRECORD ) 

:  BOOLEAN  ; 

VAR 

TEMPTRAN  :  TRANSRECORD  ; 

I  :  INTEGER  ; 

FOUND  :  BOOLEAN  ; 

BEGIN 

TEMPTRAN  :=  CURRUSER A. USERTRANS  ; 

FOUND  :=  FALSE  ; 

WHILE  NOT  (FOUND)  AND  ( TEMPTR AN< >N I L )  DO  BEGIN 
FOUND  ;=  TEMPTRAN-'".  TRANSNAME  =  CHECK  ID  ; 

TEMPTRAN  :=  TEMPTRAN'".  NEXTTRANS  ; 

END  ; 

DUPLICATETRANS  :=  FOUND  ; 

END  i 
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(****************r**********iM*************************4^**)n 

<* 

<*  PROCEDURE  BUILDMATRI X  READS  THE  USER  DATABASE  COMPARING 
(*  THE  TRANSACTIONS  AND  SYSTEM  FUNCTIONS  USERS  HAVE  IN 
<*  COMMON  TO  COMPUTE  A  SIMILARITY  VALUE.  THIS  VALUE 
(*  AND  THE  USER  NAMES  ARE  WRITTEN  TO  FILE  MTRIXFIL  IF  THE 
<t  SIMILARITY  VALUE  IS  GREATER  THAN  THE  CUTOFF  VALUE (SIMLIMIT) 
<*  THIS  PROCEDURE  IS  INVOKED  IN  INPUT  OF  THE  ‘'COMPARE"  COMMAND 
<* 

<*#*******#**************************************************) 

PROCEDURE  BUILDMATRI X  ; 

CONST 

SIMLIMIT  =  0.5  ! 

LABEL 

U,  12,  13  ; 

VAR 

MATCHPO INTER  :  USERRECORD  ; 

TOTALCOUNT  , 

COMMONCOUNT,  J  :  INTEGER  ; 

SIMILARITY  :  REAL  ; 

BEGIN 

REWRITE (MTRIXFIL)  ; 

WRITELN (MTRIXFIL, 7  USER  SIMILARITY  TABLE  -f , 

USERHEAD" . OFF ICE)  j 
USERPO INTER  :=  USERHEAD  ; 

WHILE  USERPO INTER  <>  NIL  DO  BEGIN 

IF  ( USERPO I NTER-.CLUSTERMEMBER)  THEN  GOTO  12  ; 

MATCHPO INTER  :=  USERHEAD  ; 

WRITE  (MTRIXFIL,  USERPO  INTER--.  NAME)  ; 

IF  USERPO INTER'. TRANSCOUNT  =  0  THEN  BEGIN 
WRITE (MTRIXFIL, '  NO  TRANSACTIONS. 7 )  ; 

GOTO  11  ; 

END  ; 

WRITELN (MTRIXFIL)  ; 

J  :  =  0  ; 

WHILE  MATCHPOINTER  <>  MIL  DO  BEGIN 

IF  (MATCHPOINTER-. CLUSTERMEMBER)  THEN  GOTO  13  ; 

IF  MATCHPOINTER  =  USERPO INTER  THEN 

COMMONCOUNT  :=  USERPO INTER-. TRANSCOUNT 
ELSE  BEGIN 

TRANSPOINTER  :=  USERPO INTER-. USERTRANS  s 
COMMONCOUNT  :=  0  ; 

WHILE  TRANSPQ INTER  <>  NIL  DO  BEGIN 

IF  (DUPLI C ATETR ANS  ( TRANSF'O I NTEft- .  TRANSNAME , 
MATCHPOINTER) ) 

THEN  COMMONCOUNT  :=  COMMONCOUNT  +  1  5 
TRANSPOINTER  :=  TRANSF'O  INTER-.  NEXTTRANS  ; 

END  ;  (*  WHILE  *) 

END  ;  (*  ELSE  *) 

TOTALCOUNT  :=  USERPO INTER-. TRANSCOUNT  + 

MATCHPOINTER-. TRANSCOUNT  ; 

SIMILARITY  :=  COMMONCOUNT/ (TOTALCOUNT-COMMONCOUNT )  ; 


IF  SIMILARITY  >  SIMLIMIT  THEN  BEGIN 
WR I TE  <  MTR I XF I L ,  MATCHF'O  I NTER"' .  NAME , 

S I M I L  AR I TY :  4 :  2 ,  *  ’  )  } 

J  :=  J  +  1  S 
END  ; 

IF  J  -  4  THEN  BEGIN 
WRITELN  <MTRI XFIL)  j 
J  l=  0  | 

END  ; 

13:  MATCHPOINTER  :=  MATCHPO INTERS. NEXT  ; 

END  ;  (*  WHILE  *) 

11:  WRITELN (MTR I XFIL)  ; 

12:  USERPQINTER  :  =  USERPO INTER"'.  NEXT  j 

END  ;  <*  WHILE  *) 

END  ; 
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*  r 


( ********************************************************** ) 

(* 

(*  PROCEDURE  DELETETRANS  DISPOSES  OF  TRANSACTION  RECORDS 
<*  BELONGING  TO  A  SPECIFIC  USER  RECORD. 

(*  CALLING  FORMAT  i  DELETETRANS < X , V )  |  WHERE  X  IS  A  VARIABLE 
(*  CONTAINING  THE  USER  RECORD  POINTER  FROM  WHICH  THE  TRANS- 
<*  ACTION  IS  TO  BE  DELETED  AND  V  IS  THE  TRANSACTION  ID. 

(* 

(t**t*tt*t**t***tttt*t tt ******************************* tt**) 

PROCEDURE  DELETETRANS (OLDUSER i  USERRECORD? CHECK I Dt ALFA)  ; 

VAR 

FOUND  i  BOOLEAN  % 

PRE V I OUSTRANS ,  TEMPTRANS*  TRANSRECORD  | 

BEGIN 

TEMPTRANS  i-  OLDUSER^. USERTRANS  j 
IF  TEMPTRANS  <>  NIL  THEN  BEGIN 

IF  TEMPTRANS' . TRANSNAME  -  CHECK  ID  THEN  BEGIN 
OLDUSER'1' . USERTRANS  i«  TEMPTRANS''. NEXTTRANS  | 
WRITELN < '  TRANSACTION  DELETED-  CHECK ID)  j 

OLDUSER''.  TRANSCOUNT  *-  OLDUSER'' .  TRANSCOUNT  -  1  j 
DISPOSE (TEMPTRANS)  j  END 
ELSE  BEGIN 

FOUND  »-  FALSE  | 

WHILE  (TEMPTRANSONIL)  AND  (FQUND<  >TRUE)  DO  BEGIN 
PREV I OUSTRANS  i»  TEMPTRANS  ; 

TEMPTRANS  *»  TEMPTRANS"' .  NEXTTRANS  ; 

IF  TEMPTRANS''.  TRANSNAME  »  CHECK  ID  THEN  BEGIN 
PREV I OUSTRANS " . NEXTTRANS  I - 

TEMPTRANS"' .  NEXTTRANS  » 

WR I TELN ( ’  TRANSACT I ON  DELETED-  ’ , CHECK ID)  ; 
OLDUSER'- .  TRANSCOUNT :  =OLDUSER'  .  TRANSCOUNT  -  1 
DISPOSE (TEMPTRANS)  ; 

FOUND  «-  TRUE  j  END 
END  j  (*  WHILE  *) 

END  ;  (*  ELSE  *) 

END  i  <*  IF  *) 

END  ;  (*  DELETETRANS  *) 
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u***************##^****************************************) 

<* 

<*  PROCEDURE  LISTCLUSTER  DISPLAYS  ON  THE  TERMINAL  THE  USER 
<*  IDENTIFIERS  IN  THE  CURRENT  CLUSTER  SROUPING. 

<»  CALLING  FORMAT  :  LISTCLUSTER  5 
(* 

(**********##*###*####***##*#*#*#**#**#*#***#******** *#**##*) 


PROCEDURE  LISTCLUSTER  ; 

VAR 

INDEX  :  INTEGER  ; 

BEGIN 

INDEX  :  =  1  ; 

WRITELN  ; 

WRITELN ( '  FOLLOWING  USERS  HAVE  COMMON  TRANSACTIONS  -' )  j 
WHILE  CLUSTERC INDEX  1  <>  NIL  DO  BEGIN 
WRITE  <’  *  ,  CLUSTERC  INDEX  I'"-.  NAME)  5 
IF  INDEX  MOD  6=0  THEN  WRITELN  ; 

INDEX  s=  INDEX  +  1  ; 

END  ; 

END  ; 
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<**M*************:M****************************  ft*****#***#*) 
(* 

<*  PROCEDURE  BU I LDCLUSTER  IS  DELETES  FROM  THE  WORKING  USER 
<*  DATABASE  ALL  USER  THAT  DO  NOT  HAVE  SPECIFIC  TRANSACTIONS 
<*  OR  SYSTEM  FUNCTIONS  IN  COMMON  WITH  OTHER  USERS.  TERMINAL 
(*  INPUT  SPECIFIES  THE  TRANSACTIONS/FUNCTIONS  TO  BE  CHECKED 
(*  FOR.  THIS  PROCESS  CAN  BE  CONTINUED  INDEFINITELY,  WITH  THE 
<*  USERS  STILL  IN  THE  DATABASE  BEING  DISPLAYED  EACH  TIME 
<*  NEW  SPECIFICATIONS  HAVE  BEEN  APPLIED. 

(********#******#***************#***#***#******#************) 

PROCEDURE  BUI LDCLUSTER  ; 

VAR 

OLDINDEX,  NEWINDEX  s  INTEGER  $ 

NEXTUSER  :  USERRECORD  ; 

BEGIN 

GETTRANS I D  <  TRANS ID)  ; 

NEWINDEX  1  ; 

NEXTUSER  :=  USERHEAD  ; 

WHILE  NEXTUSERONIL  DO  BEGIN 

IF  (DUPLICATETRANS (TRANSID, NEXTUSER) )  THEN  BEGIN 
CLUSTER C NEW  INDEX  3  :*=  NEXTUSER  ; 

NEWINDEX  :=  NEWINDEX  +  1  ; 

END  ; 

NEXTUSER  :=  NEXTUSER-^. NEXT  ; 

END  ; 

CLUSTER! NEW INDEX  3  :=  NIL  ? 

GETTRANS I D  <  TRANS ID)  ; 

WHILE  TRANSID  <>  ’END  ’  DO  BEGIN 

NEWINDEX  s  =  1  ; 

OLDINDEX  s=  1  j 

NEXTUSER  s=  CLUSTER COLD INDEX 3  ; 

WHILE  NEXTUSER  <>  NIL  DO  BEGIN 

IF  (DUPLICATETRANS (TRANSID, NEXTUSER) )  THEN  BEGIN 
CLUSTER C NEW INDEX  3  :=  CLUSTERCOLDINDEX 3  ; 

NEWINDEX  :=  NEWINDEX  +  1  ; 

END  } 

OLDINDEX  :=  OLDINDEX  +  1  ; 

NEXTUSER  :=  CLUSTERCOLDINDEX 3  ; 

END  ; 

CLUSTER! NEW INDEX 3  :=  NIL  ; 

LISTCLUSTER  ; 

GETTRANS ID (TRANSID)  5 
END  ; 

END  j 
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(*****t***!M****llu*  ********************************  *********) 
(* 

(*  PROCEDURE  BUIUDSUBCLUSTER  IS  REDUCES  THE  CURRENT  WORK 
(*  DATABASE  TO  ONLY  ONE  GROUP  OF  USER  RECORDS  BASED  ON 
(*  THE  OFFICE  IDENTIFIER  INPUT  BY  THE  USER.  THE  DATABASE 
(*  IS  DESTROYED  DURING  THIS  OPERATION;  THEREFORE  CARE  MUST 
<*  BE  TAKEN  TO  PRECLUDE  INADVERTENTLY  DESTROYING  A  DATABASE 
(*  COPY  THAT  HAS  NOT  BEEN  PROTECTED  PROPERLY. 

<*  CALLING  FORMAT:  BU I LDSUBCLUSTER  ; 

<* 

(***************************************** ******************) 

PROCEDURE  BUI LDSUBCLUSTER  ; 

LABEL  2  ; 

VAR 

TEMPUSER,  LASTENTRY  a  USERRECORD  ; 

BEGIN 

GETOFF ICEID(OFFICEID)  ; 

IF  OFFICEID  =  ’  END  *  THEN  GOTO  2  ; 

TEMPUSER  :=  USERHEAD  ; 

WHILE  (TEMPUSER''.  OFFICE  <>  OFFICEID)  DO 
TEMPUSER  a  **  TEMPUSER''  .  NEXT  ; 

USERHEAD  :=  TEMPUSER  ; 

TEMPUSER  :=  TEMPUSER''.  NEXT  ; 

LASTENTRY  :=  USERHEAD  ; 

WHILE  (TEMPUSER  <>  NIL)  DO  BEGIN 

IF  TEMPUSER-''.  OFFICE  =  OFFICEID  THEN  BEGIN 
LASTENTRY'''.  NEXT  a  -  TEMPUSER  ; 

LASTENTRY  a*  TEMPUSER  ; 

END  ; 

TEMPUSER  :=  TEMPUSER-''.  NEXT  ; 

END  ; 

LASTENTRY''.  NEXT  :=  NIL  ? 

2:  END  ; 
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BEGIN  <«  MAIN  *) 

USERHEAD  :=  NIL  ; 

10s 

GETCOMM AND ( COMMAND )  j 
IF  COMMAND  =  'ADD  USER  '  THEN  BEGIN 
20*  GETUSERID (USERID)  ; 

IF  USERID  =  'END  '  THEN  GOTO  10  ; 

IF  DUPL I CATEUSERID (USERID, USERPOINTER)  THEN  BEGIN 
WRITELN ( '  USER  ALREADY  EXISTS')  ; 

GOTO  20  ; 

END  ; 

BU I LDUSER (USERID)  ; 

GOTO  20  j 
END 

ELSE  BEGIN 

IF  COMMAND  =  'ADD  TRANS  '  THEN  BEGIN 
GETUSERID (USERID) 5 

IF  NOT  DUPLICATEUSER (USERID, USERPOINTER)  THEN  BEGIN 
WRITELN ('  USER  DOES  NOT  EXIST  **********  '); 

GOTO  10  ; 

END  ; 

80: 

GETTRANS I D ( TRANS ID)  ; 

IF  TRANS ID  -  'END  ’  THEN  GOTO  10  ; 

IF  ( DUPL I CATETRANS ( TRANS I D , USERPO I NTER ) )  THEN  BEGIN 
WRITELN ('  USER  ALREADY  HAS  THIS  TRANSACTION')  ; 
GOTO  80  ; 

END  ; 

BUILD  TRANS ( TRANS I D , USERPO I NTER )  ; 

GOTO  80  ; 

END 

ELSE  BEGIN 

IF  COMMAND  ■  'END  '  THEN 

GOTO  99 
ELSE  BEGIN 

IF  COMMAND  =  'READ  INPUT’  THEN  BEGIN 
READOLDFILE  ; 

GOTO  10 
END 

ELSE  BEGIN 

IF  COMMAND  -  'SAVE  DATA  ’  THEN  BEGIN 
SAVEDATA  ; 

GOTO  99  ; 

END 

ELSE  BEGIN 

IF  COMMAND  =  'DEL  TRANS  '  THEN  BEGIN 
GETUSERID (USERID)  ; 

IF  NOT  DUPLICATEUSER (USER ID, USERPO I NTER) 
THEN  BEGIN 

WRITELN ( ’  USER  DOES  NOT  EXIST  ')  ; 
GOTO  10  ; 

END  y 

75:  GETTRANS I D(TR ANSI D)  ; 

IF  TRANS  ID  =  'END  ’  THEN  GOTO  10  ; 

DELETETRANS ( USERPO I NTER , TRANS ID)  5 
GOTO  75  $ 

END 
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THEN  BEGIN 


ELSE  BEGIN 

IF  COMMAND*' DEL  USER  ' 

GETUSER I D  <  USER ID)  5 
IF  DUFLICATEUSER (USERID, USERPOINTER) 

THEN  BEGIN 

WRITELN ( ’  USER  DELETED-  USERID)  ; 
DELETEUSER (USERPOINTER)  j 
GOTO  10  ; 

END  j 

WRITELN ('  USER  DOES  NOT  EXIST')  j 

END 

ELSE  BEGIN 

IF  COMMAND  =  'CLUSTER  ’  THEN 
BUI LDCLUSTER 
ELSE  BEGIN 

IF  COMMAND* 'ADD  OFFICE’  THEN  BEGIN 
85:  GETOFF I CE I D  <  OFF I CE I D )  ; 

IF  NOT  ( OFF I CE I D= ’ END  ') 

THEN  BEGIN 

ADDOFF I CE ( OFF I CE I D )  ; 

GOTO  85  5 
END  ; 

END  5 

IF  COMMAND*' SUBCLUSTER'  THEN  BEGIN 

WRITELN (’  COMMAND  DESTROYS  WORKING  ', 
’ DATABASE.  CONTINUE? (YES/NO) ’ ) : 

READLN  ; 

GETALFA  < COMMAND )  ; 

IF  COMMANDO’ YES 
THEN  GOTO  10  ; 

BU I LDSUBCLUSTER  ; 

GOTO  10  ; 

END  ; 

IF  COMMAND*’ COMPARE  ’  THEN  BUILDMATRIX  ; 
IF  COMMAND*’ SORT  USERS'  THEN  SORTUSERS  ; 
END  5 
END  ; 

GOTO  10  ; 

END  ; 

END  :  (*  ELSE  *) 

END  j 
END  ; 

END  ; 

END  ; 

99: 

END. 
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Appendix  C 

Distributed  System  Clusters 


The  system  clusters  are  listed  on  the  following  pages 
in  the  similarity  matrix  format  illustrated  in  Figure  2-5. 
The  rows  and  columns  are  headed  by  the  terminal  identifier, 
with  the  computed  percentage  of  similarity  for  two  users 
being  located  at  the  intersection  of  the  row  of  one,  and  the 
column  of  the  other.  At  the  bottom  of  each  cluster  matrix 
are  the  cluster  name  and  the  average  cluster  similarity 
value.  The  computation  of  this  average  value  is  described 
in  the  Relationship  Evaluation  section  of  Chapter  IV. 
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A  DISTRIBUTED  PROCESSING  DESIGN  FOR  THE  PERSONNEL  DATA  SYSTEM’S— ETC (U) 
DEC  SI  R  V  BRANDT 
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Scott  AFB  Cluster  #  1 
Average  Similarity  64. 4% 
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Average  Similarity  69.2V. 
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AFMPC  Cluster  #  11 
Average  Similarity  71.8% 


AFMPC  Cluster  #  13 
Average  Similarity  49. 8% 


AFMF'C  Cluster  #  15 
Average  Similarity  49.9’/. 


Randolph  AFB  Central  Cluster 
Average  Similarity  50.5% 


Introduction 


Appendix  D 

Software  User 1 s  Guide 


This  software  package  (Appendix  B)  supports  the  actions 
necessary  to  create  and  maintain  a  data  base  of  doubly  linked 
records,  and  the  analysis  of  those  records  for  the  generation 
of  clusters.  As  implemented  for  this  thesis,  the  database 
consists  of  user  records  and  transaction  records.  The  clus¬ 
tering  and  comparison  capabilities  described  later  support 
generation  of  similarity  matrices  based  on  the  percentage  of 
transaction  records  which  users  in  the  “working  database" 
have  in  common . 

The  software  is  written  in  Pascal  for  the  Cyber  computer 
and  functions  in  an  interactive  mode  with  the  user.  To  sup¬ 
port  large  scale  efforts  with  a  database  being  created  during 
several  sessions,  the  package  provides  the  capability  to  save 
the  current  working  database  and  to  subsequently  use  it  as 
initial  input  for  the  addition  of  more  records. 

Concept  of  Operation 

This  package  has  been  developed  with  minimal  internal 
data  editing  to  provide  more  generalized  capabilities.  There¬ 
fore,  although  this  user's  guide  will  continue  to  refer  to 
the  data  items,  variables,  and  records  as  they  were  implemen¬ 
ted,  the  reader  should  primarily  concentrate  on  the  capabilities 
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provided  rather  than  the  implementation  specific  aspects. 

User  interaction  is  controlled  through  the  main  pro¬ 
gram  command  menu  consisting  of  the  twelve  actions  supported. 
Each  of  these  menu  items  is  described  in  the  following  para¬ 
graphs.  Where  some  logical  progression  of  actions  was  in¬ 
tended,  the  sequence  utilized  during  the  thesis  will  be  de¬ 
scribed  along  with  the  justification. 


Fig  D— 1 .  Command  Menu 

Read  Input.  The  READ  INPUT  command  causes  the  program 
to  read  records  from  the  local  file  DATAFIL,  and  to  create 
an  initial  linked  list  of  user  and  transaction  records  in 
the  program  work  area.  Absence  of  DATAFIL  or  improperly 
formatted  records  in  the  file  will  cause  abnormal  termination 
of  the  program.  The  record  layout  for  records  in  the  file 
is  shown  in  Figure  D-2  below.  The  last  record  in  the  file 
must  contain  END  in  columns  1-3.  The  transaction  count  in 


Starting 

Range  of 

Column 

Length 

Contents 

Values 

1 

10 

User  Identifier 

A/N 

11 

10 

User  Office 

A/N 

21 

1 

Blank 

22 

1 

Transaction  Count 

0-9 

23 

1 

Blank 

24 

10 

Transaction  Identifier 

A/N 

34 

10 

Transaction  Identifier 

A/N 

104 

10 

Transaction  Identifier 

A/N 

Fig  D-2.  DATAFIL  Record  Layout 

column  22  is  the  count  for  transactions  on  that  specific 
DATAFIL  record,  and  does  not  restrict  the  number  of  trans¬ 
action  records  that  may  be  linked  to  a  user  record.  If  a 
user  record  has  thirty  transactions  linked  to  it,  these  trans 
actions  will  be  on  successive  records  in  DATAFIL,  each  with 
a  count  entry  stating  the  number  of  transactions  on  that  spe¬ 
cific  DATAFIL  record  for  the  user.  After  reading  the  entire 
DATAFIL  and  encountering  the  END  record,  the  message  "COUNT 
OF  INCOMING  USERS:  XXX"  will  be  displayed  and  the  command 
menu  will  be  displayed  again. 

Save  Data .  The  SAVE  DATA  command  creates  a  new  local 
output  file  (DATAFIL)  containing  records  as  described  in 
Figure  D-2.  These  records  are  built  by  extracting  informa¬ 
tion  from  the  records  in  the  program  work  area  existing  when 
the  command  is  issued.  As  implemented,  the  transaction  count 


in  column  22  of  each  record  will  be  between  zero  and  five, 
to  allow  display  of  the  records  on  a  terminal  screen  without 
encountering  wrap-around.  Completion  of  the  SAVE  DATA  oper¬ 
ation  is  indicated  by  display  of  the  message  "OUTPUT  USER 
COUNT:  XXX"  and  redisplay  of  the  conmand  menu. 

Add  User .  The  ADD  USER  command  supports  creation  of 
new  user  records  linked  into  the  list  of  user  records  pres¬ 
ently  in  the  work  area.  Upon  input  of  the  command  a  prompt 
("ENTER  USER  ID  OR  END")  is  displayed.  The  desired  user 
identifier  (not  more  than  ten  alphanumeric  characters)  should 
be  input.  If  END  is  entered,  control  will  return  to  the 
command  menu.  The  user  identifier  entered  is  matched  to  the 
user  identifiers  currently  in  the  work  area.  If  a  match  is 
found,  a  message  ("USER  ALREADY  EXISTS")  is  displayed  and 
the  prompt  for  entry  of  the  user  identifier  will  be  repeated. 
If  the  user  identifier  is  unique  a  new  user  record  is  created 
and  linked  to  any  others  in  the  work  area.  Once  completed, 
the  prompt  for  entry  of  a  new  user  identifier  will  be  dis¬ 
played  to  allow  continuing  creation  of  user  records  without 
redisplay  of  the  command  menu. 

Del  User .  The  DEL  USER  command  permits  selective 
deletion  of  user  records  and  their  associated  transaction 
records  from  the  current  work  area.  Input  of  the  command 
will  be  followed  by  display  of  the  prompt  message  "ENTER 
USER  ID  OR  END."  Entry  of  END  will  cause  control  to  return 
to  the  command  menu.  If  a  user  identifier  is  entered  the 
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current  work  area  is  searched  for  a  match.  If  no  match  is 
found  the  message  "USER  DOES  NOT  EXIST"  is  displayed  and  the 
prompt  for  entry  of  the  user  identifier  is  repeated.  When  a 
match  is  found,  the  user  record  and  all  transaction  records 
linked  to  it  are  deleted  from  the  work  area  and  the  message 
"USER  DELETED-  XXXXXXXXXX"  is  displayed.  Completion  of  the 
deletion  is  followed  by  redisplay  of  the  prompt  for  entry  of 
the  user  identifier.  This  allows  continual  input  of  deletions 
without  returning  to  the  main  command  menu. 

Add  Trans .  The  ADD  TRANS  command  allows  creation  of  a 
new  transaction  record  to  be  linked  to  a  specific  user  record. 
Entry  of  the  command  is  followed  by  the  prompt  "ENTER  USER  ID 
OR  END" .  The  user  identifier  should  be  entered  and  the  work 
area  will  be  searched  for  a  record  match.  When  the  user  re¬ 
cord  is  found  the  prompt  "ENTER  TRANSACTION  ID  OR  END"  will 
be  displayed.  The  transaction  identifier  will  be  checked 
for  duplication  in  the  user  record  and  if  not  a  duplicate,  a 
transaction  record  is  created  and  linked  to  the  user  record. 
The  record  creation  is  followed  by  display  of  the  prompt  for 
entry  of  a  new  transaction  identifier  if  more  are  to  be  added. 
If  the  transaction  entered  is  already  linked  to  the  user 
record  the  message  "USER  ALREADY  HAS  THIS  TRANSACTION"  is 
displayed,  and  the  prompt  is  repeated.  Entry  of  END  for 
either  the  transaction  or  user  prompt  causes  return  to  the 
command  menu. 
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Del  Trans .  The  DEL  TRANS  command  allows  deletion  of  a 
specific  transaction  record  from  a  user  record,  in  response 


to  the  command,  the  “ENTER  USER  ID  OR  END"  prompt  is  dis¬ 
played  to  identify  the  user  involved.  The  input  user  iden¬ 
tifier  will  be  checked  for  a  match  to  the  records  in  the  work 
area  and  the  prompt  "ENTER  TRANSACTION  ID  OR  END"  will  be  dis¬ 
played  if  a  match  is  found.  The  transaction  identifier  en¬ 
tered  is  used  to  check  all  transaction  records  linked  to  that 
user.  If  a  match  is  found,  the  transaction  record  is  deleted 
and  the  message  "TRANSACTION  DELETED-  XXXXXXXXXX"  is  displayed 
and  followed  by  return  to  the  prompt  for  entry  of  another 
transaction  identifier.  Input  of  END  to  either  prompt  message 
will  cause  control  to  return  to  the  command  menu.  If  no  match 
on  the  user  or  the  transaction  identifier  is  found,  an  appro¬ 
priate  message  is  displayed  and  the  prompt  is  repeated. 

Subcluster.  The  SUBCLUSTER  command  is  used  to  reduce 
the  current  work  area  file  to  only  those  user  records  with 
the  office  identifier  specified  in  response  to  the  "ENTER 
OFFICE  ID  OR  END"  message.  All  transaction  records  linked 
to  the  selected  user  records  are  also  retained  for  following 
analysis  activities.  Input  of  the  SUBCLUSTER  command  is 
followed  by  display  of  a  message  warning  that  use  of  this 
command  destroys  the  currently  existing  work  area  database. 

The  warning  is  intended  to  remind  the  user  that  the  current 
work  area  may  need  to  be  saved  before  proceeding  with  the 
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subcluster  process.  If  YES  is  answered  to  the  warning  mes¬ 
sage,  the  prompt  for  entry  of  the  office  identifier  is  dis¬ 
played.  The  office  identifier  is  used  to  search  the  work 
area  and  create  a  new  linkage  pattern  which  includes  only 
those  user  records  with  the  specific  office  identifier. 
Completion  of  the  subcluster  process  is  indicated  by  return 
to  the  command  menu  display.  Since  no  data  edits  are  applied 
to  the  actual  data  in  a  user  record,  the  data  item  OFFICE 
could  contain  location  information  or  the  building  or  room 
number  of  the  user.  Performing  the  subcluster  process  based 
on  such  an  attribute  could  then  be  followed  by  use  of  the 
COMPARE  command  to  create  similarity  matrices  for  the  clus¬ 
tered  users. 

Cluster.  The  CLUSTER  command  is  similar  to  the  sub¬ 
cluster  just  described  in  that  it  allows  the  work  area  data¬ 
base  to  be  reduced  in  size  based  on  common  user  attributes. 
The  CLUSTER  command  allows  input  of  transaction  identifiers 
which  are  used  to  determine  which  user  records  will  be  re¬ 
tained  in  the  work  area.  Entry  of  the  command  is  followed 
by  display  of  the  prompt  "ENTER  TRANSACTION  ID  OR  END." 

When  the  transaction  identifier  is  entered,  only  user  records 
with  that  transaction  identifier  linked  to  them  will  be  re¬ 
tained.  Once  the  work  area  has  been  reconfigured  based  on 
one  transaction  identifier,  the  prompt  is  repeated  and  a  new 
transaction  identifier  may  be  input  to  be  applied  to  the 
remaining  user  records.  The  concept  behind  development  of 
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this  capability  was  the  possibility  of  having  designed  a  dis¬ 
tributed  system  node  which  would  support  a  fixed  set  of  func¬ 
tions  or  transactions,  and  desiring  to  use  the  similarity 
matrix  process  only  on  those  user  records  requiring  all  of 
the  available  functions.  Once  two  select  transaction  identi¬ 
fiers  have  been  applied  to  the  database,  a  list  of  all  users 
remaining  in  the  work  area  is  displayed.  In  this  way,  the 
user  may  continue  to  apply  more  selection  identifiers  until 
only  the  desired  number  of  user  records  remain.  Input  of  END 
to  the  prompt  message  causes  return  of  control  to  the  command 
menu. 

Compare.  The  COMPARE  command  supports  generation  of 
similarity  matrices  which  depict  the  percentage  of  transaction 
identifier  commonality  between  users  in  the  work  area  data¬ 
base.  The  computation  of  the  similarity  value  is  accomplished 
by  counting  the  total  number  of  unique  transaction  identifiers 
linked  to  two  user  records,  adding  the  number  of  common  trans¬ 
action  identifiers  linked  to  the  two,  and  dividing  the  number 
in  common  by  this  total  count.  This  comparison  process  is 
very  time  consuming  and  should  be  used  only  on  user  record 
groups  of  some  manageable  size.  For  estimation  purposes,  a 
user  group  of  125  users  and  approximately  2500  linked  trans¬ 
action  records  required  almost  one  minute  of  CPU  time  for 
generation  of  the  complete  matrix.  Since  an  interactive 
session  is  limited  to  two  minutes  of  CPU  time  on  the  ASD 
Cyber,  only  one  such  comparison  can  be  accommodated  during  a 


session.  The  output  from  the  COMPARE  process  is  a  local  file 
(MTRIXFIL) .  The  records  on  the  output  file  are  grouped  with 
a  user  identifier  on  a  line  by  itself,  followed  by  records 
with  four  users  and  their  associated  similarity  values  as 
compared  to  that  user.  The  BUILDMATRIX  procedure  is  used  to 
create  the  matrix  file  and  compute  the  similarity  values. 

The  users  to  be  included  in  the  output  file  can  be  controlled 
based  on  the  amount  of  similarity  that  exists  between  two 
users.  The  BUILDMATRIX  constant  SIMLIMIT  is  used  to  compare 
against  the  computed  similarity  value,  and  if  the  computed 
value  is  greater  than  SIMLIMIT  the  user  identifier  and  the 
computed  value  will  be  written  to  the  file.  If  the  computed 
value  is  less  than  or  equal  to  SIMLIMIT  no  write  occurs. 
Completion  of  the  compare  process  is  indicated  by  return  to 
the  command  menu. 

Sort  Users .  The  SORT  USERS  command  is  used  to  sort  the 
user  records  in  the  work  area  database  into  order  based  on 
ascending  user  identifier.  This  command  has  no  impact  on  the 
file  DATAFIL,  thus  the  sorted  file  should  be  saved  upon  com¬ 
pletion  of  the  command.  Completion  of  the  command  is  indi¬ 
cated  by  return  to  the  command  menu. 

Add  Office.  The  ADD  OFFICE  command  permits  entry  of 
ten  characters  of  identifying  data  into  the  user  record  data 
item  OFFICE.  As  mentioned  earlier,  no  editing  on  the  input 
data  is  performed  so  any  type  of  identification  data  can  be 
placed  in  this  data  item  to  be  used  as  the  select  parameter 
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for  the  SUBCLUSTER  command.  In  response  to  the  ADD  OFFICE 
command  the  prompt  "ENTER  OFFICE  ID  OR  END"  is  displayed. 

Once  the  office  identifier  is  entered,  the  prompt  "ENTER  USER 
ID  FOR  OFFICE,  OR  END"  is  displayed.  The  user  identifier  for 
the  record  to  receive  this  office  identifier  should  be  en¬ 
tered.  The  user  record  will  be  searched  for,  and  when  found 
the  office  identifier  will  be  stored.  Completion  of  this 
process  will  be  followed  by  display  of  the  prompt  for  entry 
of  a  user  identifier  so  that  multiple  user  records  may  re¬ 
ceive  the  same  office  identifier  if  so  desired.  If  no  match 
on  the  user  identifier  is  found,  an  appropriate  message  is 
displayed  and  the  prompt  for  entry  of  a  user  identifier  is 
repeated.  Input  of  END  causes  return  of  control  to  the  com¬ 
mand  menu. 

End.  The  END  command  causes  termination  of  the  inter¬ 
active  program.  Prior  to  input  of  this  command,  the  user 
should  have  saved  the  work  area  database  if  it  contains  data 
to  be  retained. 

Summary 

This  software  package  has  been  developed  to  be  as  flex¬ 
ible  as  possible.  Changes  made  to  the  record  data  item  names 
for  users  and  transactions  should  in  many  instances  provide 
the  package  user  with  a  clear  depiction  of  the  attributes 
being  created  and  maintained.  The  modular  construction  will 
hopefully  allow  additional  capabilities  to  be  added  as  further 
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investigation  into  distributed  processing  concepts  provide 
analysis  methodologies. 
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